Search papers, labs, and topics across Lattice.
2
0
4
Stop wasting compute: WS-GRPO learns when an LLM should stop reasoning, slashing rollout lengths without sacrificing accuracy by turning final answer correctness into prefix-level guidance.
Achieve near 20-point accuracy gains in reasoning tasks by dynamically routing between latent and discrete reasoning spaces based on model confidence.