Search papers, labs, and topics across Lattice.
3
0
7
7
Multi-iteration experience learning in LLMs can lead to capability collapse, but strategic adjustments in experience granularity and injection patterns can stabilize and enhance performance.
Standard RLVR's token-level updates are often dominated by irrelevant tokens, but DelTA's discriminative reweighting unlocks significant performance gains in reasoning tasks.
Students can surpass their teachers in on-policy distillation by extrapolating rewards and merging knowledge from domain experts, challenging the conventional wisdom that students are inherently limited by their teachers' capabilities.