Apr 22, 2026arXiv:2604.20305

AdaTracker: Learning Adaptive In-Context Policy for Cross-Embodiment Active Visual Tracking

Kui Wu, Jinzhu Han, Haijun Liu, Churan Wang, Yizhou Wang, Zhoujun Li, Si Liu, Fangwei Zhong

AI Summary

AdaTracker is introduced, an adaptive in-context policy learning framework for cross-embodiment active visual tracking that uses an Embodiment Context Encoder to infer embodiment-specific constraints from history. This contextual representation then modulates a Context-Aware Policy, allowing for zero-shot control on unseen embodiments. Experiments show AdaTracker outperforms SOTA methods in cross-embodiment generalization, sample efficiency, and zero-shot adaptation in both simulation and real-world settings.

Key Contribution

Achieve zero-shot cross-embodiment visual tracking by dynamically adapting control policies to inferred embodiment constraints, eliminating the need for per-robot training.

Abstract

Realizing active visual tracking with a single unified model across diverse robots is challenging, as the physical constraints and motion dynamics vary drastically from one platform to another. Existing approaches typically train separate models for each embodiment, leading to poor scalability and limited generalization. To address this, we propose AdaTracker, an adaptive in-context policy learning framework that robustly tracks targets on diverse robot morphologies. Our key insight is to explicitly model embodiment-specific constraints through an Embodiment Context Encoder, which infers embodiment-specific constraints from history. This contextual representation dynamically modulates a Context-Aware Policy, enabling it to infer optimal control actions for unseen embodiments in a zero-shot manner. To enhance robustness, we introduce two auxiliary objectives to ensure accurate context identification and temporal consistency. Experiments in both simulation and the real world demonstrate that AdaTracker significantly outperforms state-of-the-art methods in cross-embodiment generalization, sample efficiency, and zero-shot adaptation.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AdaTracker: Learning Adaptive In-Context Policy for Cross-Embodiment Active Visual Tracking

Related Papers