Search papers, labs, and topics across Lattice.
This paper introduces Aco2, a novel approach for autonomous aerial manipulation that leverages Contextual Contrastive Meta Reinforcement Learning to enable quadrotors to adaptively pick up, transport, and deliver various payloads without human intervention. By employing a contextual observation encoder and a contrastive objective, the method allows the quadrotor to infer and optimize its flight dynamics based on recent interactions, thereby enhancing its ability to generalize across diverse payloads. The results demonstrate that Aco2 can be trained entirely in simulation and deployed directly on physical quadrotors without the need for real-world fine-tuning, marking a significant advancement in autonomous UAV operations.
Aco2 enables quadrotors to autonomously adapt to diverse payloads in real-time, eliminating the need for manual calibration or system identification.
Unmanned aerial vehicles (UAVs) are increasingly being deployed in logistics, service robotics, and other real-world applications, creating a growing demand for autonomous payload acquisition and delivery. Existing approaches typically assume pre-attached payloads or rely on specialized grippers, leaving versatile end-to-end aerial delivery largely unresolved, where different payloads induce highly variable flight dynamics, requiring a single policy to adapt online without manual calibration or explicit system identification. To this end, we study \textbf{A}utonomous \textbf{A}erial Manipulation via \textbf{Co}ntextual \textbf{Co}ntrastive Meta Reinforcement Learning (\textbf{\textit{Aco2}}), a fully autonomous aerial delivery setting in which a quadrotor equipped with a lightweight hook continuously picks up, transports, and delivers diverse handle-equipped objects between randomized locations, all without human intervention. First, we design a contextual observation encoder that infers a compact latent context from recent interaction history, enabling the policy to adapt online to payload-dependent dynamics. To further improve the quality of this context, we introduce a contrastive objective that structures the context embedding around task-relevant variations, improving generalization across diverse payloads without requiring explicit system identification. Trained entirely in simulation with extensive domain randomization, \textit{Aco2} can be directly deployed on a physical quadrotor without real-world fine-tuning.