Search papers, labs, and topics across Lattice.
The paper addresses the problem of unreliable supervision in self-supervised feed-forward scene flow estimation by introducing TeFlow, a method that leverages multi-frame information. TeFlow aggregates temporally consistent motion cues from multiple frames to form more reliable supervisory signals, mitigating the issue of abrupt changes in point correspondences across frames. Experiments on Argoverse 2 and nuScenes datasets demonstrate that TeFlow achieves state-of-the-art performance for self-supervised feed-forward methods, with performance gains of up to 33% and speeds up 150 times compared to optimization-based methods.
Forget unreliable two-frame supervision: TeFlow unlocks 33% better scene flow estimation by mining temporally consistent motion cues across multiple frames for self-supervised feed-forward models.
Self-supervised feed-forward methods for scene flow estimation offer real-time efficiency, but their supervision from two-frame point correspondences is unreliable and often breaks down under occlusions. Multi-frame supervision has the potential to provide more stable guidance by incorporating motion cues from past frames, yet naive extensions of two-frame objectives are ineffective because point correspondences vary abruptly across frames, producing inconsistent signals. In the paper, we present TeFlow, enabling multi-frame supervision for feed-forward models by mining temporally consistent supervision. TeFlow introduces a temporal ensembling strategy that forms reliable supervisory signals by aggregating the most temporally consistent motion cues from a candidate pool built across multiple frames. Extensive evaluations demonstrate that TeFlow establishes a new state-of-the-art for self-supervised feed-forward methods, achieving performance gains of up to 33\% on the challenging Argoverse 2 and nuScenes datasets. Our method performs on par with leading optimization-based methods, yet speeds up 150 times. The code is open-sourced at https://github.com/KTH-RPL/OpenSceneFlow along with trained model weights.