Search papers, labs, and topics across Lattice.
1
0
3
Generalization in LLMs hinges on training reward saturation dynamics, with reasoning faithfulness emerging as a critical predictor of success under weak supervision.