Search papers, labs, and topics across Lattice.
GALAR-TemporalNet v2 is introduced to tackle multi-label temporal classification in Video Capsule Endoscopy (VCE) by addressing class imbalance, long-range dependencies, and pathology-anatomy entanglement. The architecture combines windowed self-attention, a Dual-Graph GCN, and Bidirectional Mamba, along with an anatomy prototype residual pathway and frame-level GCN skip connection. The redesigned GALAR-TemporalNet v2 significantly improves performance on the RARE-VISION test set, achieving mAP@0.5 of 0.3409 and mAP@0.95 of 0.3333.
Bidirectional Mamba and dual-graph GCNs can untangle pathology from anatomy in video capsule endoscopy, boosting mAP by nearly 8 points.
Video Capsule Endoscopy (VCE) poses a challenging multi-label temporal classification problem, requiring simultaneous localization of 8 anatomical regions and detection of 9 pathological findings across tens of thousands of frames. We present GALAR-TemporalNet v2, a hierarchical temporal model that addresses three core challenges: extreme class imbalance, long-range temporal dependencies, and pathology--anatomy entanglement. Our architecture combines windowed self-attention for local modeling, a Dual-Graph GCN for global frame relationships, and Bidirectional Mamba for selective boundary context encoding. A novel anatomy prototype residual pathway decouples pathological deviation signals from normal organ appearance, and a frame-level GCN skip connection stabilizes training of visually confusable rare classes. The competition version, GALAR-TemporalNet, achieved an overall mAP@0.5 of 0.2644 and mAP@0.95 of 0.2353 on the RARE-VISION test set. Following the competition, the redesigned GALAR-TemporalNet v2 -- incorporating a restructured pathology branch, refined loss functions, and extended post-processing -- improved these results to mAP@0.5 of 0.3409 and mAP@0.95 of 0.3333.