Search papers, labs, and topics across Lattice.
2
0
6
LRMs can often recover from injected errors in their reasoning steps, revealing a hidden "critique" ability that can be harnessed to improve performance without additional training.
Intrinsic reward signals in unsupervised RL for LLMs inevitably collapse due to sharpening of the model's prior, but external rewards grounded in computational asymmetries offer a path to sustained scaling.