Search papers, labs, and topics across Lattice.
1
0
3
Current reward models struggle to distinguish good vs. bad agent behavior in complex tool-using scenarios, especially over long horizons, revealing a critical gap in alignment research.