Search papers, labs, and topics across Lattice.
This paper introduces Visual Para-Thinker++, a novel single-policy multi-agent framework designed to enhance visual reasoning by employing a shared MLLM policy across Main, Worker, and Summary Agents. By allowing Worker Agents to reason in parallel while the Summary Agent reconciles their outputs, the framework effectively mitigates issues of perceptual commitment and hallucination that plague traditional single-chain reasoning approaches. Experimental results demonstrate that Visual Para-Thinker++ significantly outperforms existing baselines across various benchmarks, particularly excelling in scenarios sensitive to hallucinations.
Visual Para-Thinker++ achieves remarkable improvements in visual reasoning accuracy by leveraging a multi-agent architecture that minimizes hallucination risks through parallel processing and effective output reconciliation.
Visual reasoning requires integrating evidence distributed across regions, attributes, and relations, making single-chain reasoning prone to early perceptual commitment and hallucination. We propose Visual Para-Thinker++, a single-policy multi-agent framework in which one shared MLLM policy is instantiated as role-conditioned Main, Worker, and Summary Agents. The Main Agent decomposes the task with fixed allocation patterns; Worker Agents reason in parallel under context isolation; and the Summary Agent reconciles full Worker reasoning traces rather than majority-voting on final labels. The shared policy is trained by Multi-Agent Capability Injection and Role-Decoupled Multi-Agent Optimization, which assign role-specific rewards and advantages to corresponding token segments to reduce gradient conflict among collaborative roles. A native inference engine enables efficient multi-agent rollout through shared visual prefix and KV cache reuse. Across V*, CountBench, the RefCOCO family, and HallusionBench, Visual Para-Thinker++ consistently outperforms single-trajectory and inference-time parallel baselines, with especially strong gains on hallucination-sensitive visual reasoning.