Search papers, labs, and topics across Lattice.
13 papers from Berkeley AI Research (BAIR) on Tool Use & Agents
Training domain-specific coding LLMs with realistic environments and large-scale RL can yield substantial gains in practical software engineering tasks.
LLM agents can now learn on the fly and adapt to evolving user needs without disruptive downtime, thanks to a novel meta-learning framework that synthesizes new skills from failure trajectories and optimizes the base policy during inactive periods.
Securing AI agents demands a new security paradigm, as their integration of LLMs with traditional systems introduces vulnerabilities beyond those of standard software.
Skip the expensive supervised fine-tuning: this RL-only method teaches LLMs to use tools by showing them how in-context, then gradually removing the crutches until they're tool-using pros in zero-shot.
Multimodal web agents are surprisingly vulnerable to cross-modal attacks, but a novel adversarial training approach can double task completion efficiency while mitigating these risks.
LLMs can now generate more accurate and complex CAD models by pointing to existing geometric entities, rather than relying on discretized command sequences prone to topological errors.
Existing QA benchmarks are too easy for LLMs, so iAgentBench offers a more realistic challenge by requiring agents to synthesize information from multiple sources on high-traffic topics.
Advisor performance paradoxically suffers most when personal AI is used moderately, highlighting the complex strategic interactions introduced by personal AI assistants.
Human-written solutions can actually *hurt* model performance on math problems, highlighting a critical gap between strategy usage and executability that Selective Strategy Retrieval (SSR) effectively bridges.
Aggregating responses from multiple copies of the same model expands the range of achievable outputs in compound AI systems through three key mechanisms, offering a path to overcome individual model limitations.
LLM-driven program evolution gets a smart upgrade: AdaEvolve dynamically allocates resources to promising solution candidates, leaving static schedules in the dust.
An educational RAG system achieves 84% accuracy in answering student questions with minimal human editing, suggesting a practical path towards scalable AI-assisted teaching.
LLMs can't reliably generate the very skills that boost their performance, and smaller models equipped with expert-crafted skills can rival larger, skill-less models.