Search papers, labs, and topics across Lattice.
The Chinese University of Hong Kong, Ling Team
2
0
5
QK normalization can be effectively integrated into MLA without the overhead of full key caching, leading to improved performance and efficiency.
Federated learning can adapt to asynchronous data drift with up to 83% less retraining cost by using a Mixture-of-Experts architecture to selectively update local parameters.