Search papers, labs, and topics across Lattice.
1
0
3
Verifiable rewards in RL can lead to impressive performance gains in compact LLMs, but don't guarantee robust physical reasoning, instead inducing procedural solution templates.