Microsoft Research

×Tool Use & Agents

23 papers from Microsoft Research on Tool Use & Agents

May 5, 2026

Agentic-imodels: Evolving agentic interpretability tools via autoresearch

Forget human-readable models: Agentic-imodels evolves ML models that are optimized for LLM interpretability, boosting agentic data science performance by up to 73%.

Chandan Singh, Y. Tan, Weijia Xu +4

Interpretability & Mechanistic Interp Tool Use & Agents

Apr 22, 2026

Apr 22, 2026·also Microsoft Research, California State Polytechnic University

Auditing and Controlling AI Agent Actions in Spreadsheets

Users who actively participate in an AI agent's spreadsheet execution not only improve task outcomes, but also gain a deeper understanding and feel more ownership over the results.

Sadra Sabouri, Zeinabsadat Saghi, Run Huang +4

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp Tool Use & Agents

Apr 22, 2026·also Microsoft Research, Independent

From Hidden Profiles to Governable Personalization: Recommender Systems in the Age of LLM Agents

LLMs are poised to flip the script on personalization, giving users unprecedented control over their data and how it's used across platforms.

Jiahao Liu, Mingzhe Han, Guanming Liu +5

Recommendation & Information Retrieval Tool Use & Agents

Apr 19, 2026

Apr 19, 2026·also Microsoft Research, Independent

Transparent and Controllable Recommendation Filtering via Multimodal Multi-Agent Collaboration

A groundbreaking framework reduces false positives in recommendation systems by over 74%, restoring user control and transparency in content curation.

Jiahao Liu, Dongsheng Li, Hansu Gu +2

Multimodal Models Recommendation & Information Retrieval Tool Use & Agents

Apr 15, 2026

Apr 15, 2026·also Microsoft Research

Towards Enabling An Artificial Self-Construction Software Life-cycle via Autopoietic Architectures

Imagine software that autonomously evolves and maintains itself – this paper lays out the architectural groundwork for making that a reality.

Daniel Rodriguez-Cardenas, Daniel Rodríguez-Cárdenas, David Nader Palacio +3

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Tool Use & Agents

Apr 14, 2026

CMU MLApr 14, 2026·also Microsoft Research

See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual Feedback

Iterative visual refinement lets agents navigate dense coding IDEs with superhuman precision, outperforming single-shot methods and paving the way for more reliable software engineering agents.

Himangi Mittal, Gaurav Mittal, Nelson Daniel Troncoso

Computer Vision Multimodal Models Tool Use & Agents

Microsoft ResearchApr 14, 2026·also Virginia Tech

WebXSkill: Skill Learning for Autonomous Web Agents

Autonomous web agents get a serious upgrade with WebXSkill, which lets them learn and execute skills with both code-level precision and human-readable guidance.

Zhaoyang Wang, Qianhui Wu, Xuchao Zhang +16

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Apr 13, 2026

Microsoft ResearchApr 13, 2026·also Beijing University of Posts

SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context

Don't let your SWE agent drown in context: SWE-AGILE maintains performance on multi-turn software engineering tasks by dynamically managing reasoning context with a novel sliding window and compressed reasoning digests.

Shuquan Lian, Juncheng Liu, Yazhe Chen +2

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

Apr 9, 2026

St. Thomas' College of Engineering and TechnologyApr 9, 2026·also Microsoft Research

To Copilot and Beyond: 22 AI Systems Developers Want Built

Developers want AI to handle the grunt work around coding, but hands off when it comes to the creative core – revealing that the true value of AI tooling may lie in knowing where *not* to help.

Rudrajit Choudhuri, Christian Bird, Carmen Badea +1

Code Generation & Program Synthesis Tool Use & Agents

Microsoft ResearchApr 9, 2026·also MIT CSAIL

From Gaze to Guidance: Interpreting and Adapting to Users'Cognitive Needs with Multimodal Gaze-Aware AI Assistants

Gaze-tracking unlocks a new level of personalized AI assistance, enabling LLMs to infer user cognitive states and boost recall performance.

Valdemar Danry, Javier Hernandez, Andrew D Wilson +3

Eval Frameworks & Benchmarks Multimodal Models Natural Language Processing+1

Microsoft ResearchApr 9, 2026·also Georgia Tech, Virginia Tech

ORACLE-SWE: Quantifying the Contribution of Oracle Information Signals on SWE Agents

Knowing the *perfect* API to use or *exact* location to edit could drastically improve SWE agent performance, but knowing the perfect regression test result? Not so much.

Kenan Li, Qirui Jin, Liao Zhu +16

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Apr 2, 2026

Microsoft ResearchApr 2, 2026

GeoAI Agency Primitives

GeoAI assistants remain unproductive because they lack a crucial agency layer for iterative human-AI collaboration, a gap this paper addresses with nine core primitives.

Akram Zaytar, Rohan Sawahn, Caleb Robinson +5

Computer Vision Multimodal Models Tool Use & Agents

Mar 29, 2026

UWMar 29, 2026·also AI2, Microsoft Research, Stanford HAI, Bake AI +5

Emergent Social Intelligence Risks in Generative Multi-Agent Systems

Generative multi-agent systems spontaneously exhibit collusion and conformity, mirroring societal pathologies, even without explicit programming and bypassing individual agent safeguards.

Wenjie Wang, Yuchen Ma, Zichen Chen +4

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Mar 19, 2026

Mar 19, 2026·also Microsoft Research, Stevens, USC

Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution

LLM agents can slash task completion time by almost 50% simply by predicting and pre-executing likely tool calls.

Yifan Sui, Han Zhao, Rui Ma +4

Inference & Quantization Tool Use & Agents

Mar 17, 2026

Microsoft ResearchMar 17, 2026

Intent Formalization: A Grand Challenge for Reliable Coding in the Age of AI Agents

AI-generated code's fluency masks a critical flaw: it often fails to deliver what users actually intend, highlighting the urgent need for "intent formalization" to bridge the gap between informal requirements and precise program behavior.

Shuvendu K. Lahiri

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Feb 26, 2026

Tsinghua AIFeb 26, 2026·also Microsoft Research, Beihang

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

LLM agents can learn to explore novel states and generalize to new tasks with a hybrid on- and off-policy RL framework that leverages memory.

Zeyuan Liu, Zeyuan Liu, Jeonghye Kim +4

RLHF & Preference Learning Tool Use & Agents World Models & Planning

Feb 25, 2026

Microsoft ResearchFeb 25, 2026·also HKU, NYU, UNC

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

GUI agents can achieve significantly stronger task-solving capabilities through carefully designed post-training and data curation, without relying on costly online data collection.

Rui Yang, Qianhui Wu, Qianhui Wu +12

Reasoning & Chain-of-Thought RLHF & Preference Learning Tool Use & Agents

Feb 24, 2026

Microsoft ResearchFeb 24, 2026·also USTB

Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence

AgentOS reimagines LLMs as reasoning kernels within a structured OS, offering a blueprint for more robust and scalable AI agents.

ChengYou Li, Xiaodong Liu, XiangBao Meng +1

Architecture Design (Transformers, SSMs, MoE)Reasoning & Chain-of-Thought Tool Use & Agents

Feb 24, 2026·also Microsoft Research

ActionEngine: From Reactive to Programmatic GUI Agents via State Machine Memory

Forget slow, reactive GUI agents – ActionEngine uses a state-machine memory to plan actions programmatically, slashing costs by 11.8x and doubling speed while boosting task success to 95%.

Hongbin Zhong, Luis França, Tanakorn Leesatapornwongsa +2

Multimodal Models Tool Use & Agents World Models & Planning

Feb 19, 2026

Feb 19, 2026·also Microsoft Research, School of Artificial Intelligence

Web Verbs: Typed Abstractions for Reliable Task Composition on the Agentic Web

Imagine a world where web agents don't just click and type, but orchestrate complex tasks with the reliability of APIs – Web Verbs offer a path to that future.

Rui Xi, Zhijie Liu, Shuo Chen

Natural Language Processing Tool Use & Agents

Microsoft ResearchFeb 19, 2026·also Northeastern

Computer-Using World Model

World models can now effectively simulate complex desktop software environments like Microsoft Office, enabling agents to reason about actions before execution and significantly improving performance.

Yiming Guan, Rui Yu, John Zhang +23

Tool Use & Agents World Models & Planning

Feb 18, 2026

Microsoft ResearchFeb 18, 2026

Verifiable Semantics for Agent-to-Agent Communication

Guaranteeing consistent communication between AI agents is now possible: a new certification protocol slashes disagreement by up to 96% by ensuring agents share a common understanding of terms.

Matt Carlson, Chris Schneider, Chris Daly

Natural Language Processing Tool Use & Agents

Feb 18, 2025

Microsoft ResearchFeb 18, 2025·also NVIDIA, KAIST, UW-Madison

Magma: A Foundation Model for Multimodal AI Agents

Forget task-specific models: Magma, a single foundation model, now outperforms them in both UI navigation and robotic manipulation by bridging verbal and action abilities.

Jianwei Yang, Reuben Tan, Qianhui Wu +1099

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Search

Microsoft Research