Latticethe structure behind the noise

Papers Digest Topics Selected Labs Collections FAQ

Created by Flynn Lachendro

Papers Digest Topics Labs Saved

Search

Search papers, labs, and topics across Lattice.

Built by Flynn Lachendro·𝕏 / Twitter·RSS··FAQ·Glossary·Privacy

Yilong Zhao | Lattice

Yilong Zhao

Papers on Lattice

3

Total citations

0

Topics

4

h-index

0

Publication activitypapers/week, last 8 weeks

Research focus

Distributed Systems & Hardware (3)Inference & Quantization (3)Architecture Design (Transformers, SSMs, MoE) (1)Training Efficiency & Optimization (1)

Frequent co-authors

Fangxin Liu (1)O. Mutlu (1)Mingyu Gao (1)Jian Liu (1)

Papers (3)

Jun 29, 2026

Tsinghua AI3w ago·also Beihang, Shanghai Qi Zhi Institute, SJTU

COSM: A Cooperative Scheduling Framework for Concurrent PIM and CPU Execution on Mobile Devices

COSM achieves a remarkable 2.8x improvement in PIM throughput while keeping CPU performance degradation under 2.0%.

Yilong Zhao, Fangxin Liu, O. Mutlu +4

Distributed Systems & Hardware Inference & Quantization

Apr 14, 2026

Hongyi Jin +19Apr 14, 2026

Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel

Unlock 2x faster LLM serving and slash warmup times by fusing kernels that gracefully handle dynamic shapes and data dependencies.

Hongyi Jin, Bohan Hou, Guanjie Wang +17

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Mar 10, 2026

BAIRMar 10, 2026·also NVIDIA, Tsinghua AI, Soyeon Caren Han is the corresponding

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

K-means, previously relegated to offline processing, gets a 17.9x speed boost on modern GPUs thanks to Flash-KMeans' clever IO and contention optimizations.

Shuo Yang, Shuo Yang, Haocheng Xi +16

Distributed Systems & Hardware Inference & Quantization Training Efficiency & Optimization

Haibing Guan (1)