Search papers, labs, and topics across Lattice.
The Chinese University of Hong Kong, Shenzhen, China, Shenzhen International Center for Industrial and Applied Mathematics, Shenzhen Research Institute of Big Data
2
0
2
Polynomial preconditioning can significantly enhance LLM training stability and performance without adding inference overhead.
Adam's notorious divergence problem? Solved (with the right hyperparameter tuning).