NUSHUSTLa Trobe UniversityApr 14, 2026arXiv:2604.12665

Hypergraph-State Collaborative Reasoning for Multi-Object Tracking

Zikai Song, Yi-Ping Phoebe Chen, Xinchao Wang

AI Summary

This paper introduces a collaborative reasoning framework, HyperSSM, to improve multi-object tracking (MOT) by addressing instability from noisy predictions and vulnerability to occlusions. HyperSSM integrates a Hypergraph module to capture spatial motion correlations and a State Space Model (SSM) to enforce temporal smoothness. Experiments on MOT17, MOT20, DanceTrack, and SportsMOT show state-of-the-art performance, demonstrating the framework's ability to stabilize trajectories and infer motion continuity under occlusion.

Key Contribution

Multi-object tracking gets a boost: HyperSSM leverages collaborative reasoning to maintain robust object trajectories, even when visual cues disappear.

Abstract

Motion reasoning serves as the cornerstone of multi-object tracking (MOT), as it enables consistent association of targets across frames. However, existing motion estimation approaches face two major limitations: (1) instability caused by noisy or probabilistic predictions, and (2) vulnerability under occlusion, where trajectories often fragment once visual cues disappear. To overcome these issues, we propose a collaborative reasoning framework that enhances motion estimation through joint inference among multiple correlated objects. By allowing objects with similar motion states to mutually constrain and refine each other, our framework stabilizes noisy trajectories and infers plausible motion continuity even when target is occluded. To realize this concept, we design HyperSSM, an architecture that integrates Hypergraph computation and a State Space Model (SSM) for unified spatial-temporal reasoning. The Hypergraph module captures spatial motion correlations through dynamic hyperedges, while the SSM enforces temporal smoothness via structured state transitions. This synergistic design enables simultaneous optimization of spatial consensus and temporal coherence, resulting in robust and stable motion estimation. Extensive experiments on four mainstream and diverse benchmarks(MOT17, MOT20, DanceTrack, and SportsMOT) covering various motion patterns and scene complexities, demonstrate that our approach achieves state-of-the-art performance across a wide range of tracking scenarios.

Computer Vision Reasoning & Chain-of-Thought Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Hypergraph-State Collaborative Reasoning for Multi-Object Tracking

Related Papers