Search papers, labs, and topics across Lattice.
2
0
4
3
Pass-rate-1 prompts got you down? Composition-RL boosts LLM reasoning by automatically composing multiple problems into new verifiable questions, making better use of your existing data.
Students can surpass their teachers in on-policy distillation by extrapolating rewards and merging knowledge from domain experts, challenging the conventional wisdom that students are inherently limited by their teachers' capabilities.