Search papers, labs, and topics across Lattice.
This paper introduces a collaborative reasoning framework, HyperSSM, to improve multi-object tracking (MOT) by addressing instability from noisy predictions and vulnerability to occlusions. HyperSSM integrates a Hypergraph module to capture spatial motion correlations and a State Space Model (SSM) to enforce temporal smoothness. Experiments on MOT17, MOT20, DanceTrack, and SportsMOT show state-of-the-art performance, demonstrating the framework's ability to stabilize trajectories and infer motion continuity under occlusion.
Multi-object tracking gets a boost: HyperSSM leverages collaborative reasoning to maintain robust object trajectories, even when visual cues disappear.
Motion reasoning serves as the cornerstone of multi-object tracking (MOT), as it enables consistent association of targets across frames. However, existing motion estimation approaches face two major limitations: (1) instability caused by noisy or probabilistic predictions, and (2) vulnerability under occlusion, where trajectories often fragment once visual cues disappear. To overcome these issues, we propose a collaborative reasoning framework that enhances motion estimation through joint inference among multiple correlated objects. By allowing objects with similar motion states to mutually constrain and refine each other, our framework stabilizes noisy trajectories and infers plausible motion continuity even when target is occluded. To realize this concept, we design HyperSSM, an architecture that integrates Hypergraph computation and a State Space Model (SSM) for unified spatial-temporal reasoning. The Hypergraph module captures spatial motion correlations through dynamic hyperedges, while the SSM enforces temporal smoothness via structured state transitions. This synergistic design enables simultaneous optimization of spatial consensus and temporal coherence, resulting in robust and stable motion estimation. Extensive experiments on four mainstream and diverse benchmarks(MOT17, MOT20, DanceTrack, and SportsMOT) covering various motion patterns and scene complexities, demonstrate that our approach achieves state-of-the-art performance across a wide range of tracking scenarios.