Search papers, labs, and topics across Lattice.
2
0
4
2
LLMs can be taught to "think longer" and explore more diverse reasoning paths in-context via a simple length-incentivized reward, leading to improved generalization.
RLVR training leaves a tell-tale sign: prompts encountered during fine-tuning produce unusually similar reasoning trajectories, detectable without access to model internals.