Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University
4
0
5
Small initialization can dramatically enhance reasoning performance in large language models, revealing a new lever for improving AI capabilities.
Adam can achieve linear convergence on highly degenerate polynomials without careful tuning, thanks to a built-in mechanism that exponentially amplifies the effective learning rate.
Key contribution not extracted.
Stop wasting compute: PonderLM-3 learns to spend extra inference FLOPs only on the tokens that actually need them, outperforming fixed-step pondering methods.