Search papers, labs, and topics across Lattice.
University of Chinese Academy of Sciences, Meituan
1
0
3
Training long-context sparse attention models doesn't have to be a slow, imbalanced mess: SparseBalance achieves 1.33x speedup while *improving* accuracy.