BeijingCHDApr 5, 2026arXiv:2604.03998

VA-FastNavi-MARL: Real-Time Robot Control with Multimedia-Driven Meta-Reinforcement Learning

Shengxi Jing, Fengxiang Wang, Yuan Feng, Hong Wang

AI Summary

VA-FastNavi-MARL uses meta-reinforcement learning to enable robots to rapidly adapt to new audio-visual instructions by treating them as a distribution of navigable goals. The key innovation is a modality-agnostic stream that aligns asynchronous audio-visual inputs into a unified latent representation, avoiding bottlenecks from heavy sensory processing. Experiments in a multi-arm workspace show VA-FastNavi-MARL achieves superior sample efficiency and real-time performance compared to baselines, even with noisy multimedia streams.

Key Contribution

Robots can now nimbly respond to new audio-visual commands in real-time, thanks to a meta-RL approach that bypasses the sensory processing bottleneck.

Abstract

Interpreting dynamic, heterogeneous multimedia commands with real-time responsiveness is critical for Human-Robot Interaction. We present VA-FastNavi-MARL, a framework that aligns asynchronous audio-visual inputs into a unified latent representation. By treating diverse instructions as a distribution of navigable goals via Meta-Reinforcement Learning, our method enables rapid adaptation to unseen directives with negligible inference overhead. Unlike approaches bottlenecked by heavy sensory processing, our modality-agnostic stream ensures seamless, low-latency control. Validation on a multi-arm workspace confirms that VA-FastNavi-MARL significantly outperforms baselines in sample efficiency and maintains robust, real-time execution even under noisy multimedia streams.

Multimodal Models Robotics & Embodied AI Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

VA-FastNavi-MARL: Real-Time Robot Control with Multimedia-Driven Meta-Reinforcement Learning

Related Papers