SoochowJun 17, 2026arXiv:2606.18566

Multi-Modal Hyper-Graph Fusion for Low-Light Crowd Counting

Hao-Yuan Ma, Li Zhang, Yushi Qiu, Jie Gao, Yan Zhang, Bangjun Wang

AI Summary

This paper addresses the challenge of crowd counting in low-light conditions by introducing a novel Multi-Modal Hyper-Graph Fusion approach, which integrates RGB, depth, and edge information to enhance counting accuracy. The authors constructed three new benchmarks, including two synthetic datasets and a real-world dataset, to evaluate their method's effectiveness. Experimental results show that their Low-Light Counting Network (LCNet) outperforms existing state-of-the-art techniques, highlighting the importance of multi-modal data in challenging environments.

Key Contribution

Low-light crowd counting accuracy is revolutionized by a novel multi-modal approach that leverages depth and edge information, outperforming existing methods.

Abstract

Crowd counting is a fundamental task in computer vision. However, crowd counting in low-light environments remains largely underexplored, despite its practical importance in the real world. Existing methods mainly focus on well-lit scenes or rely on single-modality Red-Green-Blue (RGB) representations, which often become unreliable under extreme darkness and complex non-uniform illumination. To handle this problem, we construct three new low-light crowd counting benchmarks, which consist of two synthetic datasets, SHA\_Dark and SHB\_Dark, and a real-world benchmark LC-Crowd (Low-light Crowd Dataset). Inspired by Retinex-based physical modeling, we introduce depth and Canny edge cues as complementary geometric and structural priors to enhance the intrinsic reflectance representation under low-light conditions. We propose a Multi-Modal Hyper-Graph Fusion module, which formulates RGB appearance, depth geometry, and edge structure cues as nodes in a unified hyper-graph and explicitly captures their high-order complementary relationships via dynamic hyperedge construction and message passing. Furthermore, to adaptively allocate computation in dense prediction, we propose a Deformable Rectangular Sparse Attention (DRSA) module, which concentrates computation on informative regions through anchor-aware estimation and adaptive rectangular window modeling. Based on these designs, we develop a unified Low-Light Counting Network (LCNet) for robust low-light crowd counting. Extensive experiments on three benchmarks demonstrate that the proposed method achieves the best overall performance against existing state-of-the-art (SOTA) methods. The code is in the supplementary material. The datasets will be made public upon acceptance.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multi-Modal Hyper-Graph Fusion for Low-Light Crowd Counting

Related Papers