Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University
2
7
4
17
Function calling gets a serious upgrade: a new reward model and inference scaling technique boosts performance by focusing on the *process* of tool use, not just the outcome.
Current LLM evaluation benchmarks often conflate chatbots and true AI agents, leading to misaligned research efforts, but this survey provides a framework for targeted evaluation based on environmental complexity and agent capabilities.