Search papers, labs, and topics across Lattice.
The paper introduces APPLV, a method that uses a vision-language-action (VLA) model to predict parameters for classical motion planners, aiming to combine the safety of classical methods with the generalization capabilities of learned approaches. APPLV employs a regression head on top of a pre-trained vision-language model to map visual and linguistic inputs to planner parameters, which are then used to configure a classical planner. The model is trained using both supervised learning from navigation trajectories and reinforcement learning to optimize navigation performance.
By predicting parameters for classical planners, APPLV offers a way to leverage foundation models for robot navigation while retaining safety guarantees and outperforming end-to-end VLA models.
Autonomous navigation in highly constrained environments remains challenging for mobile robots. Classical navigation approaches offer safety assurances but require environment-specific parameter tuning; end-to-end learning bypasses parameter tuning but struggles with precise control in constrained spaces. To this end, recent robot learning approaches automate parameter tuning while retaining classical systems'safety, yet still face challenges in generalizing to unseen environments. Recently, Vision-Language-Action (VLA) models have shown promise by leveraging foundation models'scene understanding capabilities, but still struggle with precise control and inference latency in navigation tasks. In this paper, we propose Adaptive Planner Parameter Learning from Vision-Language-Action Model (\textsc{applv}). Unlike traditional VLA models that directly output actions, \textsc{applv} leverages pre-trained vision-language models with a regression head to predict planner parameters that configure classical planners. We develop two training strategies: supervised learning fine-tuning from collected navigation trajectories and reinforcement learning fine-tuning to further optimize navigation performance. We evaluate \textsc{applv} across multiple motion planners on the simulated Benchmark Autonomous Robot Navigation (BARN) dataset and in physical robot experiments. Results demonstrate that \textsc{applv} outperforms existing methods in both navigation performance and generalization to unseen environments.