Search papers, labs, and topics across Lattice.
1
0
3
Stop wasting compute: WS-GRPO learns when an LLM should stop reasoning, slashing rollout lengths without sacrificing accuracy by turning final answer correctness into prefix-level guidance.