VanderbiltApr 6, 2026arXiv:2604.05062

GaussFly: Contrastive Reinforcement Learning for Visuomotor Policies in 3D Gaussian Fields

Yuhang Zhang, Yujing Shang, Chao Yan, Mir Feroskhan

AI Summary

GaussFly decouples representation learning from policy optimization for AAV visuomotor control by first reconstructing real-world scenes using 3D Gaussian Splatting with geometric constraints, then training a contrastive encoder on rendered images to extract robust latent features. This pre-trained encoder provides low-dimensional, noise-resilient features to a visuomotor policy, reducing computational burden and enhancing robustness. Experiments show GaussFly achieves superior sample efficiency, asymptotic performance, and zero-shot sim-to-real transfer compared to end-to-end baselines.

Key Contribution

Forget domain adaptation tricks – GaussFly uses 3D Gaussian Splatting to create such photorealistic simulations that visuomotor policies trained on them transfer to the real world zero-shot.

Abstract

Learning visuomotor policies for Autonomous Aerial Vehicles (AAVs) relying solely on monocular vision is an attractive yet highly challenging paradigm. Existing end-to-end learning approaches directly map high-dimensional RGB observations to action commands, which frequently suffer from low sample efficiency and severe sim-to-real gaps due to the visual discrepancy between simulation and physical domains. To address these long-standing challenges, we propose GaussFly, a novel framework that explicitly decouples representation learning from policy optimization through a cohesive real-to-sim-to-real paradigm. First, to achieve a high-fidelity real-to-sim transition, we reconstruct training scenes using 3D Gaussian Splatting (3DGS) augmented with explicit geometric constraints. Second, to ensure robust sim-to-real transfer, we leverage these photorealistic simulated environments and employ contrastive representation learning to extract compact, noise-resilient latent features from the rendered RGB images. By utilizing this pre-trained encoder to provide low-dimensional feature inputs, the computational burden on the visuomotor policy is significantly reduced while its resistance against visual noise is inherently enhanced. Extensive experiments in simulated and real-world environments demonstrate that GaussFly achieves superior sample efficiency and asymptotic performance compared to baselines. Crucially, it enables robust and zero-shot policy transfer to unseen real-world environments with complex textures, effectively bridging the sim-to-real gap.

Computer Vision Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References42

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GaussFly: Contrastive Reinforcement Learning for Visuomotor Policies in 3D Gaussian Fields

Related Papers