Jul 7, 2025arXiv:2507.05607

Structured Task Solving via Modular Embodied Intelligence: A Case Study on Rubik's Cube

AI Summary

This paper introduces Auto-RubikAI, a modular framework for solving Rubik's Cubes that combines a symbolic Knowledge Base (KB) for group-theoretic solving, a Vision-Language Model (VLM) for scene parsing, and a Large Language Model (LLM) for robotic control code generation. The system achieves a 79% success rate in solving Rubik's Cubes from randomized configurations in both simulation and real-world settings, demonstrating effective Sim-to-Real transfer without retraining. By integrating symbolic reasoning with perception and control, Auto-RubikAI outperforms traditional and deep learning-based baselines in terms of solution steps, interpretability, and safety.

Key Contribution

Forget end-to-end training: this modular framework uses a symbolic KB, VLM, and LLM to solve Rubik's Cube with 79% success, outperforming traditional methods in both simulation and real-world settings.

Abstract

This paper presents Auto-RubikAI, a modular autonomous planning framework that integrates a symbolic Knowledge Base (KB), a vision-language model (VLM), and a large language model (LLM) to solve structured manipulation tasks exemplified by Rubik's Cube restoration. Unlike traditional robot systems based on predefined scripts, or modern approaches relying on pretrained networks and large-scale demonstration data, Auto-RubikAI enables interpretable, multi-step task execution with minimal data requirements and no prior demonstrations. The proposed system employs a KB module to solve group-theoretic restoration steps, overcoming LLMs'limitations in symbolic reasoning. A VLM parses RGB-D input to construct a semantic 3D scene representation, while the LLM generates structured robotic control code via prompt chaining. This tri-module architecture enables robust performance under spatial uncertainty. We deploy Auto-RubikAI in both simulation and real-world settings using a 7-DOF robotic arm, demonstrating effective Sim-to-Real adaptation without retraining. Experiments show a 79% end-to-end task success rate across randomized configurations. Compared to CFOP, DeepCubeA, and Two-Phase baselines, our KB-enhanced method reduces average solution steps while maintaining interpretability and safety. Auto-RubikAI provides a cost-efficient, modular foundation for embodied task planning in smart manufacturing, robotics education, and autonomous execution scenarios. Code, prompts, and hardware modules will be released upon publication.

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Citation Metrics

Citations3

Influential citations1

References49

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

Structured Task Solving via Modular Embodied Intelligence: A Case Study on Rubik's Cube

Related Papers