Mar 9, 2026arXiv:2603.07949

RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA models

Zihao Zheng, Sicheng Tian, Hangyu Cao, Chen-Yu Li, Chenyue Li, Jiayu Chen, Maoliang Li, Xinhao Sun, H. Zou, Hailong Zou, Guojie Luo, Xiang Chen

AI Summary

The paper introduces RAPID, a new Edge-Cloud Collaborative (ECC) inference framework designed to optimize Vision Language Action (VLA) model inference by addressing challenges related to visual noise and step-wise redundancy in embodied tasks. RAPID mitigates the impact of visual noise through redundancy-aware partitioning and preserves motion continuity by considering step-wise dependencies. Experimental results demonstrate a speedup of up to 1.73x with a minimal 5-7% overhead, showcasing the framework's efficiency.

Key Contribution

VLA models get a 1.73x speedup with only 5-7% overhead thanks to RAPID, a new edge-cloud collaborative inference framework that smartly handles visual noise and motion continuity.

Abstract

Vision Language Action (VLA) models are mainstream in embodied intelligence but face high inference costs. Edge-Cloud Collaborative (ECC) inference offers an effective fix by easing edge-device computing pressure to meet real-time needs. However, existing ECC frameworks are suboptimal for VLA models due to two challenges: (1) Mainstream environment-oriented edge-cloud partitioning methods are susceptible to interference from visual noise; (2) Existing edge-cloud partitioning methods overlook the step-wise redundancy unique to embodied tasks, thereby disrupting the physical continuity of motion. To address these issues, we propose a novel ECC inference framework, termed RAPID. Specifically, we developed an implementation tailored to the proposed framework. Experiments demonstrate this achieves a speedup of up to 1.73x with only 5%~7% overhead.

Inference & Quantization Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References34

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA models

Related Papers