Mar 30, 2026arXiv:2603.28625

Dynamic Lookahead Distance via Reinforcement Learning-Based Pure Pursuit for Autonomous Racing

AI Summary

This paper introduces a hybrid control framework for autonomous racing that combines Proximal Policy Optimization (PPO) with a classical Pure Pursuit controller to dynamically adjust the lookahead distance. The PPO agent learns to map vehicle speed and multi-horizon curvature to an optimal lookahead distance, balancing cornering and stability. Experiments in simulation and on a real 1:10-scale autonomous racing platform demonstrate that the learned policy improves lap-time performance and repeated lap completion on unseen tracks, achieving better results than fixed-lookahead and adaptive Pure Pursuit baselines.

Key Contribution

Forget hand-tuning: PPO can learn to dynamically adjust Pure Pursuit's lookahead distance for autonomous racing, improving lap times and generalization to unseen tracks.

Abstract

Pure Pursuit (PP) is a widely used path-tracking algorithm in autonomous vehicles due to its simplicity and real-time performance. However, its effectiveness is sensitive to the choice of lookahead distance: shorter values improve cornering but can cause instability on straights, while longer values improve smoothness but reduce accuracy in curves. We propose a hybrid control framework that integrates Proximal Policy Optimization (PPO) with the classical Pure Pursuit controller to adjust the lookahead distance dynamically during racing. The PPO agent maps vehicle speed and multi-horizon curvature features to an online lookahead command. It is trained using Stable-Baselines3 in the F1TENTH Gym simulator with a KL penalty and learning-rate decay for stability, then deployed in a ROS2 environment to guide the controller. Experiments in simulation compare the proposed method against both fixed-lookahead Pure Pursuit and an adaptive Pure Pursuit baseline. Additional real-car experiments compare the learned controller against a fixed-lookahead Pure Pursuit controller. Results show that the learned policy improves lap-time performance and repeated lap completion on unseen tracks, while also transferring zero-shot to hardware. The learned controller adapts the lookahead by increasing it on straights and reducing it in curves, demonstrating effectiveness in augmenting a classical controller by online adaptation of a single interpretable parameter. On unseen tracks, the proposed method achieved 33.16 s on Montreal and 46.05 s on Yas Marina, while tolerating more aggressive speed-profile scaling than the baselines and achieving the best lap times among the tested settings. Initial real-car experiments further support sim-to-real transfer on a 1:10-scale autonomous racing platform

Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Dynamic Lookahead Distance via Reinforcement Learning-Based Pure Pursuit for Autonomous Racing

Related Papers