Search papers, labs, and topics across Lattice.
Qwen Large Model Application Team, Alibaba
1
0
3
Ditch the black-box reward function: this new rubric-based RL framework uses LLMs to judge responses against interpretable criteria, offering a more robust and transparent approach to alignment.