TuftsFeb 22, 2026arXiv:2602.19260

The Price Is Not Right: Neuro-Symbolic Methods Outperform VLAs on Structured Long-Horizon Manipulation Tasks with Significantly Lower Energy Consumption

AI Summary

This paper compares a fine-tuned open-weight Vision-Language-Action (VLA) model against a neuro-symbolic architecture combining PDDL-based planning and learned low-level control on structured Towers of Hanoi manipulation tasks. The neuro-symbolic model significantly outperforms the VLA in success rate (95% vs 34% on 3-block, 78% vs 0% on 4-block) and demonstrates better generalization to unseen task variants. Furthermore, the VLA model consumes nearly two orders of magnitude more energy during training than the neuro-symbolic approach, highlighting the energy efficiency benefits of structured reasoning.

Key Contribution

Neuro-symbolic methods crush VLAs on long-horizon manipulation, achieving 95% success vs 34% while using 100x less energy.

Abstract

Vision-Language-Action (VLA) models have recently been proposed as a pathway toward generalist robotic policies capable of interpreting natural language and visual inputs to generate manipulation actions. However, their effectiveness and efficiency on structured, long-horizon manipulation tasks remain unclear. In this work, we present a head-to-head empirical comparison between a fine-tuned open-weight VLA model π0 and a neuro-symbolic architecture that combines PDDL-based symbolic planning with learned low-level control. We evaluate both approaches on structured variants of the Towers of Hanoi manipulation task in simulation while measuring both task performance and energy consumption during training and execution. On the 3-block task, the neuro-symbolic model achieves 95% success compared to 34% for the best-performing VLA. The neuro-symbolic model also generalizes to an unseen 4-block variant (78% success), whereas both VLAs fail to complete the task. During training, VLA fine-tuning consumes nearly two orders of magnitude more energy than the neuro-symbolic approach. These results highlight important trade-offs between end-to-end foundation-model approaches and structured reasoning architectures for long-horizon robotic manipulation, emphasizing the role of explicit symbolic structure in improving reliability, data efficiency, and energy efficiency. Code and models are available at https://price-is-not-right.github.io

Multimodal Models Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Price Is Not Right: Neuro-Symbolic Methods Outperform VLAs on Structured Long-Horizon Manipulation Tasks with Significantly Lower Energy Consumption

Related Papers