Search papers, labs, and topics across Lattice.
1
0
3
0
Stripping away the complexity of GRPO reveals that simple REINFORCE with group relative advantage can actually *improve* LLM reasoning, challenging the assumption that sophisticated loss functions are always better.