Apr 22, 2026arXiv:2604.20246

Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

Adriana Aida, Walida Amer, Katarina Bankovic, Dhruv Behl, Fabian Busch, Annie Bhalla, Minh-Loi Duong, Florian Gienger, Rohan Godse, D. Grachev, Ralf Gulde, Elisa Hagensieker, Jun Hu, Junpeng Hu, Shivam Joshi, Tobias Knoblauch, L. Kumar, D. LaRocque, K. Lokesh, Omar Moured, Khiem Nguyen, Christian Preyss, Ranjith Sriganesan, Vikram Singh, Carsten Sponner, Anhthu Tong, Dominik Tuscher, Marc Tuscher, Pavani Upputuri

AI Summary

Cortex 2.0 is introduced as a plan-and-act system for industrial robotic manipulation that generates and scores candidate future trajectories in a visual latent space before committing to an action. This approach addresses the limitations of reactive Vision-Language-Action models in long-horizon tasks by mitigating compounding failure modes. Evaluated on single-arm and dual-arm platforms across four complex tasks, Cortex 2.0 consistently outperforms state-of-the-art reactive baselines, demonstrating robustness in cluttered, occlusion-prone industrial environments.

Key Contribution

World-model-based planning enables reliable robotic manipulation in complex industrial settings where reactive policies crumble.

Abstract

Industrial robotic manipulation demands reliable long-horizon execution across embodiments, tasks, and changing object distributions. While Vision-Language-Action models have demonstrated strong generalization, they remain fundamentally reactive. By optimizing the next action given the current observation without evaluating potential futures, they are brittle to the compounding failure modes of long-horizon tasks. Cortex 2.0 shifts from reactive control to plan-and-act by generating candidate future trajectories in visual latent space, scoring them for expected success and efficiency, then committing only to the highest-scoring candidate. We evaluate Cortex 2.0 on a single-arm and dual-arm manipulation platform across four tasks of increasing complexity: pick and place, item and trash sorting, screw sorting, and shoebox unpacking. Cortex 2.0 consistently outperforms state-of-the-art Vision-Language-Action baselines, achieving the best results across all tasks. The system remains reliable in unstructured environments characterized by heavy clutter, frequent occlusions, and contact-rich manipulation, where reactive policies fail. These results demonstrate that world-model-based planning can operate reliably in complex industrial environments.

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Citation Metrics

Citations0

Influential citations0

References42

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

Related Papers