Mar 3, 2026arXiv:2603.02936

Self-supervised Domain Adaptation for Visual 3D Pose Estimation of Nano-drone Racing Gates by Enforcing Geometric Consistency

Nicholas Carlotti, Michele Antonazzi, Elia Cereda, Mirko Nava, Nicola Basilico, Daniele Palossi, D. Palossi, Alessandro Giusti

AI Summary

This paper tackles the problem of sim-to-real domain adaptation for visual 3D pose estimation of nano-drone racing gates. They propose a self-supervised UDA approach that leverages onboard drone odometry to enforce geometric consistency between pose predictions from different viewpoints in real-world image sequences. The method achieves a 40% improvement in position and 37% in orientation accuracy compared to a baseline, using only 10 minutes of real-world flight data.

Key Contribution

Nano-drones can now visually estimate gate poses with significantly improved accuracy in the real world, thanks to a self-supervised domain adaptation technique that leverages onboard odometry.

Abstract

We consider the task of visually estimating the relative pose of a drone racing gate in front of a nano-quadrotor, using a convolutional neural network pre-trained on simulated data to regress the gate's pose. Due to the sim-to-real gap, the pre-trained model underperforms in the real world and must be adapted to the target domain. We propose an unsupervised domain adaptation (UDA) approach using only real image sequences collected by the drone flying an arbitrary trajectory in front of a gate; sequences are annotated in a self-supervised fashion with the drone's odometry as measured by its onboard sensors. On this dataset, a state consistency loss enforces that two images acquired at different times yield pose predictions that are consistent with the drone's odometry. Results indicate that our approach outperforms other SoA UDA approaches, has a low mean absolute error in position (x=26, y=28, z=10 cm) and orientation ($\psi$=13${^{\circ}}$), an improvement of 40% in position and 37% in orientation over a baseline. The approach's effectiveness is appreciable with as few as 10 minutes of real-world flight data and yields models with an inference time of 30.4ms (33 fps) when deployed aboard the Crazyflie 2.1 Brushless nano-drone.

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References45

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Self-supervised Domain Adaptation for Visual 3D Pose Estimation of Nano-drone Racing Gates by Enforcing Geometric Consistency

Related Papers