Search papers, labs, and topics across Lattice.
2
0
5
By dynamically balancing fast adaptation and stable averaging, AMUSE delivers faster convergence and better final performance than AdamW and Muon, all without any learning rate tuning.
Current QA models boasting 70+ F1 on existing benchmarks crumble on SPARTA, a new dataset revealing their surprisingly shallow reasoning abilities across tables and text.