B [68] VLM backbone. Past and currentApr 2, 2026arXiv:2604.01659

AURA: Multimodal Shared Autonomy for Real-World Urban Navigation

Yukai Ma, Honglin He, Selina Song, Wayne Wu, Bolei Zhou

AI Summary

This paper introduces Assistive Urban Robot Autonomy (AURA), a multimodal shared autonomy framework for urban navigation that decomposes the task into high-level human instruction and low-level AI control. A key component is a Spatial-Aware Instruction Encoder that aligns human instructions with visual and spatial context. Experiments on a newly constructed large-scale dataset, MM-CoS, show that AURA reduces manual operation effort, improves navigation stability, and decreases takeover frequency by 44% compared to existing methods.

Key Contribution

Free your robot from low-level control: AURA lets humans guide urban navigation with high-level instructions, slashing manual effort and boosting stability.

Abstract

Long-horizon navigation in complex urban environments relies heavily on continuous human operation, which leads to fatigue, reduced efficiency, and safety concerns. Shared autonomy, where a Vision-Language AI agent and a human operator collaborate on maneuvering the mobile machine, presents a promising solution to address these issues. However, existing shared autonomy methods often require humans and AI to operate within the same action space, leading to high cognitive overhead. We present Assistive Urban Robot Autonomy (AURA), a new multi-modal framework that decomposes urban navigation into high-level human instruction and low-level AI control. AURA incorporates a Spatial-Aware Instruction Encoder to align various human instructions with visual and spatial context. To facilitate training, we construct MM-CoS, a large-scale dataset comprising teleoperation and vision-language descriptions. Experiments in simulation and the real world demonstrate that AURA effectively follows human instructions, reduces manual operation effort, and improves navigation stability, while enabling online adaptation. Moreover, under similar takeover conditions, our shared autonomy framework reduces the frequency of takeovers by more than 44%. Demo video and more detail are provided in the project page.

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AURA: Multimodal Shared Autonomy for Real-World Urban Navigation

Related Papers