Search papers, labs, and topics across Lattice.
University of Electronic Science and Technology of China
3
0
7
Skewed item distributions in recommendation systems can be tamed with a learnable non-uniform quantization, leading to better codebook utilization and more accurate generative recommendations.
Attention's quadratic complexity is no longer a bottleneck: DASH-KV achieves linear O(N) inference without sacrificing accuracy by reformulating attention as an approximate nearest-neighbor search.
Achieve >95% forget quality in LLMs with minimal side effects by isolating and unlearning tokens within target subdomains using asymmetric LoRA.