Search papers, labs, and topics across Lattice.
Trinity College Dublin
1
0
3
Forget hand-crafted reward functions: CM2 uses checklists to train tool-using agents, outperforming SFT baselines by up to 12 points on key benchmarks.