Search papers, labs, and topics across Lattice.
MIT. Email: ruiai
1
0
3
Stop rewarding all LLM-generated candidates equally: ShapE-GRPO uses Shapley values to fairly distribute credit within sets, leading to better training and faster convergence.