Apr 13, 2026arXiv:2604.10951

Fast-SegSim: Real-Time Open-Vocabulary Segmentation for Robotics in Simulation

Xuan Yu, Yuxuan Xie, Shichao Zhai, Shuhao Ye, Rong Xiong

AI Summary

Fast-SegSim introduces a real-time open-vocabulary segmentation framework built upon 2D Gaussian Splatting to address the computational bottleneck of high-channel segmentation feature accumulation in 3D reconstruction for robotics. They achieve this through Precise Tile Intersection to reduce rasterization redundancy and a Top-K Hard Selection strategy that simplifies feature accumulation. The method achieves rendering rates exceeding 40 FPS and demonstrates its utility by fine-tuning a perception module in object goal navigation, doubling the navigation success rate.

Key Contribution

Real-time open-vocabulary segmentation for robotics is now possible, unlocking faster sim-to-real transfer with a 2x improvement in downstream navigation tasks.

Abstract

Open-vocabulary panoptic reconstruction is crucial for advanced robotics and simulation. However, existing 3D reconstruction methods, such as NeRF or Gaussian Splatting variants, often struggle to achieve the real-time inference frequency required by robotic control loops. Existing methods incur prohibitive latency when processing the high-dimensional features required for robust open-vocabulary segmentation. We propose Fast-SegSim, a novel, simple, and end-to-end framework built upon 2D Gaussian Splatting, designed to realize real-time, high-fidelity, and 3D-consistent open-vocabulary segmentation reconstruction. Our core contribution is a highly optimized rendering pipeline that specifically addresses the computational bottleneck of high-channel segmentation feature accumulation. We introduce two key optimizations: Precise Tile Intersection to reduce rasterization redundancy, and a novel Top-K Hard Selection strategy. This strategy leverages the geometric sparsity inherent in the 2D Gaussian representation to greatly simplify feature accumulation and alleviate bandwidth limitations, achieving render rates exceeding 40 FPS. Fast-SegSim provides critical value in robotic applications: it serves both as a high-frequency sensor input for simulation platforms like Gazebo, and its 3D-consistent outputs provide essential multi-view'ground truth'labels for fine-tuning downstream perception tasks. We demonstrate this utility by using the generated labels to fine-tune the perception module in object goal navigation, successfully doubling the navigation success rate. Our superior rendering speed and practical utility underscore Fast-SegSim's potential to bridge the sim-to-real gap.

Computer Vision Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References34

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Fast-SegSim: Real-Time Open-Vocabulary Segmentation for Robotics in Simulation

Related Papers