Search papers, labs, and topics across Lattice.
Alibaba Group
1
0
2
10
Forget static rubrics and expensive external models: EvoRubric co-evolves a single policy to generate both responses and the rubrics to evaluate them, outperforming traditional RLHF methods in open-ended generation tasks.