Notre DameMar 31, 2026arXiv:2604.00137

Open, Reliable, and Collective: A Community-Driven Framework for Tool-Using AI Agents

AI Summary

The paper introduces OpenTools, a community-driven framework designed to improve the reliability of tool-integrated LLMs by standardizing tool schemas, providing plug-and-play wrappers, and enabling continuous evaluation via automated test suites. They argue that both tool-use accuracy and intrinsic tool accuracy are critical for reliable tool use, and that existing work has focused too heavily on the former. Experiments show that community-contributed, task-specific tools within OpenTools improve end-to-end reproducibility and task performance by 6%-22% compared to an existing toolbox across multiple agent architectures.

Key Contribution

Community-sourced tools aren't just more diverse, they can substantially boost agent performance, suggesting that the quality of the tools themselves, not just how agents use them, is a major bottleneck.

Abstract

Tool-integrated LLMs can retrieve, compute, and take real-world actions via external tools, but reliability remains a key bottleneck. We argue that failures stem from both tool-use accuracy (how well an agent invokes a tool) and intrinsic tool accuracy (the tool's own correctness), while most prior work emphasizes the former. We introduce OpenTools, a community-driven toolbox that standardizes tool schemas, provides lightweight plug-and-play wrappers, and evaluates tools with automated test suites and continuous monitoring. We also release a public web demo where users can run predefined agents and tools and contribute test cases, enabling reliability reports to evolve as tools change. OpenTools includes the core framework, an initial tool set, evaluation pipelines, and a contribution protocol. Experiments and evaluations show improved end-to-end reproducibility and task performance; community-contributed, higher-quality task-specific tools deliver 6%-22% relative gains over an existing toolbox across multiple agent architectures on downstream tasks and benchmarks, highlighting the importance of intrinsic tool accuracy.

Eval Frameworks & Benchmarks Open-Source Models & Weights Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Open, Reliable, and Collective: A Community-Driven Framework for Tool-Using AI Agents

Related Papers