Search papers, labs, and topics across Lattice.
1
0
3
Achieve over 2x training speedup for LLM reasoning without sacrificing accuracy by dynamically pruning Group Relative Policy Optimization (GRPO) with a novel importance sampling correction.