Jinda Jia

Papers on Lattice

Total citations

Topics

h-index

Research focus

Distributed Systems & Hardware (1)Inference & Quantization (1)

Frequent co-authors

Zhongzhu Zhou (1)Junghwa Heo (1)Jue Wang (1)Tri Dao (1)

Papers (1)

Apr 21, 2026

Jinda Jia +7Apr 21, 2026·also Tsinghua AI, Peng Cheng Laboratory, Sydney, Together

SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving

Forget fancy quantization schemes – a simple token-wise INT4 quantization with Hadamard rotation is all you need to nearly match FP16 accuracy in LLM serving, without sacrificing throughput.

Jinda Jia, Zhongzhu Zhou, Junghwa Heo +5

Distributed Systems & Hardware Inference & Quantization

Search

Jinda Jia

Research focus

Frequent co-authors

Papers (1)