Differential RoboticsZJUJun 1, 2026arXiv:2606.02313

Towards Precise Intent-Aligned VLA Aerial Navigation via Expert-Guided GRPO

Tianyang Chen, Wenjun Li, Xin Zhou, Yuze Wu, Fei Gao

AI Summary

This paper introduces EG-GRPO, an efficient reinforcement learning framework designed to enhance Vision-Language-Action (VLA) models for unmanned aerial vehicle (UAV) navigation by integrating expert guidance into the training process. By addressing the limitations of standard supervised fine-tuning, such as data scarcity and weak supervision, the authors achieve a 2.13x improvement in success rates and a 60.9% enhancement in intent alignment performance across complex tasks. The proposed methodology also includes a heterogeneous pipeline that reduces rollout time by 43.5%, significantly advancing the capability of UAVs to follow nuanced human instructions.

Key Contribution

Expert-guided reinforcement learning can boost UAV navigation success rates by over 2x while drastically improving intent alignment.

Abstract

Vision-Language-Action (VLA) models offer a promising end-to-end paradigm for unmanned aerial vehicles (UAVs) to accomplish complex tasks specified by fine-grained instructions. However, standard supervised fine-tuning (SFT) suffers from data scarcity, limited generalization, and weak supervision for nuanced and complicated human intents. Reinforcement fine-tuning offers a natural way to mitigate these challenges and align policy behaviors with human intents through designable feedback, but applying it to aerial navigation remains challenging due to inefficient exploration in expansive continuous spaces. To address these challenges, we introduce an efficient reinforcement learning (RL) framework for VLA-based aerial navigation. At its core, we propose EG-GRPO (Expert-Guided Group Relative Policy Optimization) to augment online rollouts with few-shot expert data. Additionally, we design a heterogeneous pipeline enabling parallel simulation and inference, which reduces rollout time by 43.5%. Across multiple tasks specified by complex human intents, EG-GRPO improves the success rate to 2.13x that of the SFT baseline, while improving intent alignment performance by 60.9%. These results demonstrate that our framework can move aerial navigation toward precise intent-aligned flight.

Multimodal Models RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Towards Precise Intent-Aligned VLA Aerial Navigation via Expert-Guided GRPO

Related Papers