Feb 12, 2026arXiv:2602.11673

RI-Mamba: Rotation-Invariant Mamba for Robust Text-to-Shape Retrieval

Dasith de Silva Edirimuni, G. Hassan, Ajmal S. Mian

AI Summary

The paper introduces RI-Mamba, a rotation-invariant state-space model for text-to-shape retrieval that addresses the limitations of existing methods in handling objects with arbitrary orientations and diverse categories. RI-Mamba disentangles pose from geometry using global and local reference frames and Hilbert sorting to create rotation-invariant token sequences. The model incorporates orientational embeddings via feature-wise linear modulation and employs cross-modal contrastive learning with automated triplet generation for scalable training, achieving state-of-the-art results on the OmniObject3D benchmark.

Key Contribution

RI-Mamba achieves SOTA text-to-shape retrieval by ingeniously disentangling pose from geometry in point clouds, finally enabling robust searches across diverse object categories and arbitrary orientations.

Abstract

3D assets have rapidly expanded in quantity and diversity due to the growing popularity of virtual reality and gaming. As a result, text-to-shape retrieval has become essential in facilitating intuitive search within large repositories. However, existing methods require canonical poses and support few object categories, limiting their real-world applicability where objects can belong to diverse classes and appear in random orientations. To address this challenge, we propose RI-Mamba, the first rotation-invariant state-space model for point clouds. RI-Mamba defines global and local reference frames to disentangle pose from geometry and uses Hilbert sorting to construct token sequences with meaningful geometric structure while maintaining rotation invariance. We further introduce a novel strategy to compute orientational embeddings and reintegrate them via feature-wise linear modulation, effectively recovering spatial context and enhancing model expressiveness. Our strategy is inherently compatible with state-space models and operates in linear time. To scale up retrieval, we adopt cross-modal contrastive learning with automated triplet generation, allowing training on diverse datasets without manual annotation. Extensive experiments demonstrate RI-Mamba's superior representational capacity and robustness, achieving state-of-the-art performance on the OmniObject3D benchmark across more than 200 object categories under arbitrary orientations. Our code will be made available at https://github.com/ndkhanh360/RI-Mamba.git.

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References43

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RI-Mamba: Rotation-Invariant Mamba for Robust Text-to-Shape Retrieval

Related Papers