Search papers, labs, and topics across Lattice.
This paper tackles the challenge of adapting point tracking models trained on synthetic data to real-world videos by introducing a verifier meta-model that assesses the reliability of tracker predictions. The verifier selects the most trustworthy trajectories from multiple pretrained trackers to generate high-quality pseudo-labels for fine-tuning. Experiments on real-world benchmarks demonstrate that this verifier-guided pseudo-labeling approach achieves state-of-the-art results with improved data efficiency compared to existing self-training methods.
Stop hand-labeling real-world videos: a meta-model can learn to verify tracker outputs and generate high-quality pseudo-labels for fine-tuning, achieving SOTA with less data.
Models for long-term point tracking are typically trained on large synthetic datasets. The performance of these models degrades in real-world videos due to different characteristics and the absence of dense ground-truth annotations. Self-training on unlabeled videos has been explored as a practical solution, but the quality of pseudo-labels strongly depends on the reliability of teacher models, which vary across frames and scenes. In this paper, we address the problem of real-world fine-tuning and introduce verifier, a meta-model that learns to assess the reliability of tracker predictions and guide pseudo-label generation. Given candidate trajectories from multiple pretrained trackers, the verifier evaluates them per frame and selects the most trustworthy predictions, resulting in high-quality pseudo-label trajectories. When applied for fine-tuning, verifier-guided pseudo-labeling substantially improves the quality of supervision and enables data-efficient adaptation to unlabeled videos. Extensive experiments on four real-world benchmarks demonstrate that our approach achieves state-of-the-art results while requiring less data than prior self-training methods. Project page: https://kuis-ai.github.io/track_on_r