Search papers, labs, and topics across Lattice.
1
0
3
Stop rewarding all LLM-generated candidates equally: ShapE-GRPO uses Shapley values to fairly distribute credit within sets, leading to better training and faster convergence.