Search papers, labs, and topics across Lattice.
2
0
4
Generalization in LLMs hinges on training reward saturation dynamics, with reasoning faithfulness emerging as a critical predictor of success under weak supervision.
Forget brute-force scaling: targeted data curation for RLVR can unlock surprisingly large gains in LLM reasoning.