Weikuan Yu

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Architecture Design (Transformers, SSMs, MoE) (2)Distributed Systems & Hardware (2)Inference & Quantization (2)

Frequent co-authors

Bodon Jeong (1)Bodon Jeong (1)H.I. Byun (1)Hongsu Byun (1)

Papers (2)

Apr 29, 2026

Bodon Jeong +83w ago

DUAL-BLADE: Dual-Path NVMe-Direct KV-Cache Offloading for Edge LLM Inference

Edge LLM inference gets a serious speed boost: DUAL-BLADE's dual-path KV cache slashes latency by up to 42% and doubles SSD utilization.

Bodon Jeong, Bodon Jeong, H.I. Byun +6

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Apr 6, 2026

Oteo Mamo +3Apr 6, 2026

Comparative Characterization of KV Cache Management Strategies for LLM Inference

Stop guessing which KV cache optimization to use: this benchmark reveals exactly when vLLM, InfiniGen, or H2O will give you the best latency, throughput, and memory footprint for your LLM inference workload.

Oteo Mamo, Olga Kogiou, Hyunji Yi +1