Feb 18, 2026arXiv:2602.16898

MALLVI: a multi agent framework for integrated generalized robotics manipulation

Iman Ahmadi, Mehrshad Taji, Arad Mahdinezhad Kashani, AmirHossein Jadidi, Saina Kashani, Babak Khalaj

AI Summary

The paper introduces MALLVi, a multi-agent framework leveraging LLMs and VLMs for closed-loop robotic manipulation based on natural language instructions and visual feedback. MALLVi decomposes the task into specialized agents for decomposition, localization, reasoning, and reflection, enabling iterative refinement of actions. Experiments demonstrate that MALLVi achieves improved generalization and higher success rates in zero-shot manipulation tasks compared to open-loop approaches in both simulated and real-world environments.

Key Contribution

Achieve robust zero-shot robotic manipulation by orchestrating specialized LLM/VLM agents in a closed-loop system that iteratively refines actions based on environmental feedback.

Abstract

Task planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tuning, or prompt tuning, and often operate in an open loop manner without robust environmental feedback, making them fragile in dynamic settings.We present MALLVi, a Multi Agent Large Language and Vision framework that enables closed loop feedback driven robotic manipulation. Given a natural language instruction and an image of the environment, MALLVi generates executable atomic actions for a robot manipulator. After action execution, a Vision Language Model (VLM) evaluates environmental feedback and decides whether to repeat the process or proceed to the next step.Rather than using a single model, MALLVi coordinates specialized agents, Decomposer, Localizer, Thinker, and Reflector, to manage perception, localization, reasoning, and high level planning. An optional Descriptor agent provides visual memory of the initial state. The Reflector supports targeted error detection and recovery by reactivating only relevant agents, avoiding full replanning.Experiments in simulation and real world settings show that iterative closed loop multi agent coordination improves generalization and increases success rates in zero shot manipulation tasks.Code available at https://github.com/iman1234ahmadi/MALLVI.

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MALLVI: a multi agent framework for integrated generalized robotics manipulation

Related Papers