Search papers, labs, and topics across Lattice.
1
0
2
By reflecting on its own reasoning, ReflectRM achieves a +10.2 improvement in mitigating positional bias compared to leading generative reward models, making it a far more stable evaluator.