Search papers, labs, and topics across Lattice.
The paper introduces FAMA, a framework that improves the reliability of open-source LLM agents in tool-use environments by addressing error accumulation in multi-turn conversations. FAMA first analyzes failure trajectories to identify common errors, then uses a meta-agentic orchestration mechanism to activate specialized agents that inject targeted context to mitigate these failures. Experiments show that FAMA improves performance by up to 27% compared to standard baselines across various open-source LLMs.
Open-source LLM agents can get a 27% performance boost in tool use by strategically injecting context tailored to address common failure modes.
Large Language Models are being increasingly deployed as the decision-making core of autonomous agents capable of effecting change in external environments. Yet, in conversational benchmarks, which simulate real-world customer-centric issue resolution scenarios, these agents frequently fail due to the cascading effects of incorrect decision-making. These challenges are particularly pronounced for open-source LLMs with smaller parameter sizes, limited context windows, and constrained inference budgets, which contribute to increased error accumulation in agentic settings. To tackle these challenges, we present the Failure-Aware Meta-Agentic (FAMA) framework. FAMA operates in two stages: first, it analyzes failure trajectories from baseline agents to identify the most prevalent errors; second, it employs an orchestration mechanism that activates a minimal subset of specialized agents tailored to address these failures by injecting a targeted context for the tool-use agent before the decision-making step. Experiments across open-source LLMs demonstrate performance gains up to 27% across evaluation modes over standard baselines. These results highlight that targeted curation of context through specialized agents to address common failures is a valuable design principle for building reliable, multi-turn tool-use LLM agents that simulate real-world conversational scenarios.