Search papers, labs, and topics across Lattice.
Independent Researcher
2
0
3
5
Even the top-performing conversational agents struggle with reliability, hitting only 57% accuracy on a new benchmark designed to test agentic recommender systems.
Even frontier models with high reasoning budgets fail to effectively navigate densely interlinked knowledge bases and complex policies in realistic fintech customer support scenarios, achieving only ~25.5% pass rate.