Search papers, labs, and topics across Lattice.
1
0
3
5
Training LLMs for efficient reasoning is best achieved by using easier prompts to ensure a dense positive reward signal, preventing undesirable length collapse.