Search papers, labs, and topics across Lattice.
4
0
7
2
Current reward models struggle to distinguish good vs. bad agent behavior in complex tool-using scenarios, especially over long horizons, revealing a critical gap in alignment research.
An 8B parameter model, RideJudge, outperforms 32B baselines in ride-hailing dispute adjudication by aligning visual semantics with evidentiary protocols, achieving 88.41% accuracy.
Current MLLMs struggle with even basic route planning in remote sensing, highlighting a critical gap in their ability to translate perception into action in complex, real-world scenarios.
LLM agents can learn to solve complex, long-horizon tasks much more effectively by using themselves as post-hoc critics to refine Q-values through hindsight reasoning.