Mar 30, 2026arXiv:2603.28023

SegRGB-X: General RGB-X Semantic Segmentation Model

Jiong Liu, Yingjie Xu, Xingcheng Zhou, Rui Song, Walter Zimmer, Alois Knoll, Hu Cao

AI Summary

This paper introduces SegRGB-X, a unified semantic segmentation framework designed to handle arbitrary sensor modalities beyond RGB. The framework leverages a Modality-aware CLIP (MA-CLIP) for modality-specific scene understanding, Modality-aligned Embeddings for fine-grained feature capture, and a Domain-specific Refinement Module (DSRM) for dynamic feature adjustment. Experiments across five datasets with diverse modalities (event, thermal, depth, polarization, and light field) demonstrate state-of-the-art performance, achieving a mIoU of 65.03% and outperforming specialized multi-modal methods.

Key Contribution

A single model can now achieve state-of-the-art semantic segmentation across diverse sensor modalities like thermal, depth, and polarization, eliminating the need for modality-specific architectures.

Abstract

Semantic segmentation across arbitrary sensor modalities faces significant challenges due to diverse sensor characteristics, and the traditional configurations for this task result in redundant development efforts. We address these challenges by introducing a universal arbitrary-modal semantic segmentation framework that unifies segmentation across multiple modalities. Our approach features three key innovations: (1) the Modality-aware CLIP (MA-CLIP), which provides modality-specific scene understanding guidance through LoRA fine-tuning; (2) Modality-aligned Embeddings for capturing fine-grained features; and (3) the Domain-specific Refinement Module (DSRM) for dynamic feature adjustment. Evaluated on five diverse datasets with different complementary modalities (event, thermal, depth, polarization, and light field), our model surpasses specialized multi-modal methods and achieves state-of-the-art performance with a mIoU of 65.03%. The codes will be released upon acceptance.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SegRGB-X: General RGB-X Semantic Segmentation Model

Related Papers