Search papers, labs, and topics across Lattice.
13 papers from Berkeley AI Research (BAIR) on Tool Use & Agents
Multi-agent LLM systems are leaving performance on the table by treating structured agent interactions as generic traffic; Pythia shows how to unlock substantial gains by exploiting workflow semantics at the serving layer.
LLMs are revolutionizing conversational AI research, and this survey offers a structured guide to navigating the rapidly evolving landscape of LLM-powered user simulation.
Agentic data science pipelines often reach falsely optimistic conclusions, but two simple sanity checks can expose these unsupported claims by testing if the agent can reliably distinguish signal from noise.
LLM-powered simulations of societal behavior risk encoding and amplifying existing biases unless strict ethical preconditions are enforced.
Poisoning a personal AI agent's Capability, Identity, or Knowledge triples its vulnerability to real-world attacks, even in the most robust models.
Scaling prompt learning by 17x without sacrificing accuracy is now possible, unlocking efficient self-improvement for LLM agents.
Forget hyperparameter tuning – autonomous research reveals that bug fixes and architectural tweaks unlock far greater gains in multimodal agent memory.
Securing AI agents demands a new security paradigm, as their integration of LLMs with traditional systems introduces vulnerabilities beyond those of standard software.
Existing QA benchmarks are too easy for LLMs, so iAgentBench offers a more realistic challenge by requiring agents to synthesize information from multiple sources on high-traffic topics.
Multimodal web agents are surprisingly vulnerable to cross-modal attacks, but a novel adversarial training approach can double task completion efficiency while mitigating these risks.
Advisor performance paradoxically suffers most when personal AI is used moderately, highlighting the complex strategic interactions introduced by personal AI assistants.
Aggregating responses from multiple copies of the same model expands the range of achievable outputs in compound AI systems through three key mechanisms, offering a path to overcome individual model limitations.
An educational RAG system achieves 84% accuracy in answering student questions with minimal human editing, suggesting a practical path towards scalable AI-assisted teaching.