Search papers, labs, and topics across Lattice.
This paper introduces ACCORD, a framework designed to enhance the contextual grounding of language agents by actively probing their environment for missing information before executing actions. The research highlights that current agents often rely on assumed context rather than observed specifics, leading to suboptimal task performance. By integrating relevant contextual evidence from the agent's trajectory, ACCORD achieves significant improvements in task-goal completion across various benchmarks, demonstrating its effectiveness without requiring additional training or task-success signals.
Agents can boost their task completion rates by over 20% simply by grounding their actions in observed context rather than assumptions.
User instructions are often underspecified because humans rely on implicit assumptions about the surrounding environment. For large language model (LLM) agents operating in information-rich digital and physical environments, these assumptions cannot be inferred from the instruction alone; they must be recovered from the current state of tools, data, interfaces, and observations. Effective execution therefore requires agents to identify missing context, ground it in observed evidence, and carry it forward into subsequent actions. We show that current agents often fail to do so. They act from assumed rather than observed specifics, overlook information they could have gathered, and fail to incorporate evidence that has already been returned. Building on this insight, we propose ACCORD (Action-Conditioned Contextual Grounding), a simple and effective agent framework for adaptive grounding. Before each action, ACCORD actively probes the environment for missing information and integrates relevant context from the agent's trajectory that would otherwise be overlooked. Requiring no additional training or task-success signals, ACCORD improves task-goal completion on AppWorld by up to +20.6 points with GPT-5-mini, from 42.0% to 62.6%, compared to strong baselines. These gains persist with a substantially stronger base model (+10.8 with Claude-4.5-sonnet), an open-weight model (+10.1 with Qwen3.5-27B-FP8), and on the embodied AlfWorld benchmark (+7.4 success rate with GPT-5-mini).