Berkeley AI Research (BAIR)

×Tool Use & Agents

13 papers from Berkeley AI Research (BAIR) on Tool Use & Agents

Apr 28, 2026

DAMO3w ago·also BAIR, Tsinghua AI, Intel Labs, Rice

Pythia: Toward Predictability-Driven Agent-Native LLM Serving

Multi-agent LLM systems are leaving performance on the table by treating structured agent interactions as generic traffic; Pythia shows how to unlock substantial gains by exploiting workflow semantics at the serving layer.

Xin Jin, Xuanzhe Liu

Distributed Systems & Hardware Inference & Quantization Tool Use & Agents

Apr 27, 2026

BAIR3w ago·also Adobe Research, Cisco AI Research, Dolby Laboratories, Oregon +5

A Survey on LLM-based Conversational User Simulation

LLMs are revolutionizing conversational AI research, and this survey offers a structured guide to navigating the rapidly evolving landscape of LLM-powered user simulation.

B. Ni, Bo Ni, Yu Wang +35

Natural Language Processing Tool Use & Agents World Models & Planning

Apr 13, 2026

BAIRApr 13, 2026·also Microsoft Research, Center for Computational Biology, Dept. of EECS, Dept. of Statistics

Sanity Checks for Agentic Data Science

Agentic data science pipelines often reach falsely optimistic conclusions, but two simple sanity checks can expose these unsupported claims by testing if the agent can reliably distinguish signal from noise.

Zachary T. Rewolinski, Austin V. Zane, Cheng-Long Wang

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Tool Use & Agents

Apr 9, 2026

BAIRApr 9, 2026

We Need Strong Preconditions For Using Simulations In Policy

LLM-powered simulations of societal behavior risk encoding and amplifying existing biases unless strict ethical preconditions are enforced.

Steven Luo, S. Luo, Saanvi Arora +1

Constitutional AI & AI Ethics Tool Use & Agents World Models & Planning

Apr 6, 2026

UC Santa CruzApr 6, 2026·also BAIR, ByteDance, Tencent AI

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Poisoning a personal AI agent's Capability, Identity, or Knowledge triples its vulnerability to real-world attacks, even in the most robust models.

Zijun Wang, Haoqin Tu, Letian Zhang +13

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Apr 5, 2026

Stanford HAIApr 5, 2026·also Amazon Science, BAIR

Combee: Scaling Prompt Learning for Self-Improving Language Model Agents

Scaling prompt learning by 17x without sacrificing accuracy is now possible, unlocking efficient self-improvement for LLM agents.

Hanchen Li, Runyuan He, Qizheng Zhang +13

Distributed Systems & Hardware Scaling Laws & Emergent Abilities Tool Use & Agents+1

Apr 1, 2026

Apr 1, 2026·also BAIR, UC Santa Cruz, UCSC, UPenn

Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory

Forget hyperparameter tuning – autonomous research reveals that bug fixes and architectural tweaks unlock far greater gains in multimodal agent memory.

Jiaqi Liu, Zipeng Ling, Shi Qiu +9

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Tool Use & Agents

Mar 11, 2026

BAIRMar 11, 2026·also UIUC

The Attack and Defense Landscape of Agentic AI: A Comprehensive Survey

Securing AI agents demands a new security paradigm, as their integration of LLMs with traditional systems introduces vulnerabilities beyond those of standard software.

Juhee Kim, Xiaoyuan Liu, Zhun Wang +2

Red-Teaming & Adversarial Robustness Tool Use & Agents

Mar 4, 2026

BAIRMar 4, 2026

iAgentBench: Benchmarking Sensemaking Capabilities of Information-Seeking Agents on High-Traffic Topics

Existing QA benchmarks are too easy for LLMs, so iAgentBench offers a more realistic challenge by requiring agents to synthesize information from multiple sources on high-traffic topics.

Preetam Prabhu Srikar Dammu, A. Palkhiwala, Tanya Roosta +1

Eval Frameworks & Benchmarks Recommendation & Information Retrieval Tool Use & Agents

Google ResearchMar 4, 2026·also BAIR, DeepMind

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

Multimodal web agents are surprisingly vulnerable to cross-modal attacks, but a novel adversarial training approach can double task completion efficiency while mitigating these risks.

Haoyu Liu, Dingcheng Li, Lukas Rutishauser +1

Multimodal Models Red-Teaming & Adversarial Robustness Tool Use & Agents

Mar 2, 2026

Mar 2, 2026·also BAIR

Strategic Advice in the Age of Personal AI

Advisor performance paradoxically suffers most when personal AI is used moderately, highlighting the complex strategic interactions introduced by personal AI assistants.

Yueyang Liu, Wichinpong Park Sinchaisri

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

Feb 25, 2026

BAIRFeb 25, 2026·also Stanford HAI

Power and Limitations of Aggregation in Compound AI Systems

Aggregating responses from multiple copies of the same model expands the range of achievable outputs in compound AI systems through three key mechanisms, offering a path to overcome individual model limitations.

Nivasini Ananthakrishnan, Meena Jagadeesan

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory Tool Use & Agents

Feb 17, 2026

BAIRFeb 17, 2026·also CMU ML

Edison 3.0: A Multimodal RAG System for Large-Scale Educational Q&A with Human-in-the-Loop Oversight

An educational RAG system achieves 84% accuracy in answering student questions with minimal human editing, suggesting a practical path towards scalable AI-assisted teaching.

Meenakshi Mittal, Rishi Khare, Mihran Miroyan +2

Multimodal Models Recommendation & Information Retrieval Tool Use & Agents

Search

Berkeley AI Research (BAIR)