Mar 10, 2026arXiv:2603.09513

Beyond Short-Horizon: VQ-Memory for Robust Long-Horizon Manipulation in Non-Markovian Simulation Benchmarks

Wang Honghui, Jing Zhi, Ao Jicong, Song Shiji, Li Xuelong, Huang Gao, Bai Chenjia

AI Summary

The paper introduces RuleSafe, a new simulation benchmark for long-horizon articulated object manipulation tasks with non-Markovian characteristics generated using LLMs. To address the challenges posed by RuleSafe, they propose VQ-Memory, a compact temporal representation that encodes past proprioceptive states into discrete latent tokens using VQ-VAEs. Experiments demonstrate that VQ-Memory improves long-horizon planning, generalization, and efficiency when integrated with VLA models and diffusion policies on the RuleSafe benchmark.

Key Contribution

Forget pick-and-place: RuleSafe, a new benchmark featuring LLM-generated safe-cracking tasks, exposes the long-horizon planning weaknesses of current robot learning methods.

Abstract

The high cost of collecting real-robot data has made robotic simulation a scalable platform for both evaluation and data generation. Yet most existing benchmarks concentrate on simple manipulation tasks such as pick-and-place, failing to capture the non-Markovian characteristics of real-world tasks and the complexity of articulated object interactions. To address this limitation, we present RuleSafe, a new articulated manipulation benchmark built upon a scalable LLM-aided simulation framework. RuleSafe features safes with diverse unlocking mechanisms, such as key locks, password locks, and logic locks, which require different multi-stage reasoning and manipulation strategies. These LLM-generated rules produce non-Markovian and long-horizon tasks that require temporal modeling and memory-based reasoning. We further propose VQ-Memory, a compact and structured temporal representation that uses vector-quantized variational autoencoders (VQ-VAEs) to encode past proprioceptive states into discrete latent tokens. This representation filters low-level noise while preserving high-level task-phase context, providing lightweight yet robust temporal cues that are compatible with existing Vision-Language-Action models (VLA). Extensive experiments on state-of-the-art VLA models and diffusion policies show that VQ-Memory consistently improves long-horizon planning, enhances generalization to unseen configurations, and enables more efficient manipulation with reduced computational cost. Project page: vqmemory.github.io

Eval Frameworks & Benchmarks Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Beyond Short-Horizon: VQ-Memory for Robust Long-Horizon Manipulation in Non-Markovian Simulation Benchmarks

Related Papers