Search papers, labs, and topics across Lattice.
Peking University, PKU
2
0
5
Sub-2-bit LLMs can now achieve state-of-the-art performance thanks to pQuant, which selectively preserves sensitive parameters in a high-precision branch during quantization-aware training.
Multi-round LLM inference gets a major speed boost with AMPD, a new disaggregated serving framework that intelligently manages interleaved prefill-decode workloads.