Autodesk AI LabCUHKDeepCyboHong Kong Centre for Logistics RoboticsHorizon RoboticsLingnan UniversityZhongguancun AcademyZhongguancun Institute of ArtificialJun 8, 2026arXiv:2606.08980

EPS3D: End-to-End Feed-Forward 3D Panoptic Segmentation

Runsong Zhu, Jiaxin Guo, Xiaoyang Guo, Zhengzhe Liu, Ka-Hei Hui, Wei Yin, Kai Chen, Wei Chen, Weiqiang Ren, Yunhui Liu, Pheng-Ann Heng, Chi-Wing Fu

AI Summary

This paper presents EPS3D, an innovative end-to-end feed-forward framework for open-vocabulary 3D panoptic segmentation that eliminates the need for preprocessing by utilizing a distillation-based training strategy on diverse 3D scenes. The architecture enhances 3D consistency and mitigates error accumulation through a mutual enhancement module that aligns semantic and instance features. EPS3D achieves significant improvements over state-of-the-art baselines, such as a 13% increase in mean Intersection over Union (mIoU) for semantics on the Replica benchmark, while maintaining high efficiency at one second per scene.

Key Contribution

EPS3D achieves a remarkable 13% boost in semantic segmentation accuracy while processing 3D scenes in just one second.

Abstract

This paper introduces EPS3D, a new end-to-end feed-forward framework for open-vocabulary 3D panoptic segmentation. Unlike existing methods relying on additional preprocessing, we design an end-to-end architecture, with a distillation-based training strategy on diverse 3D scenes to predict 3D-aware semantic and instance features from multi-view images, improving 3D consistency and avoiding error accumulation. We further propose a mutual enhancement module to enforce inherent semantic-instance consistency. By aligning semantics within instances (Ins2Sem) and refining instance features with semantic guidance (Sem2Ins), we achieve more coherent 3D scene understanding. Ultimately, EPS3D outperforms SOTA baselines on two benchmarks (e.g., +13% mIoU for semantics on Replica) with high efficiency (e.g., 1s per scene), supporting tasks like robotic manipulation and 3D scene editing.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

EPS3D: End-to-End Feed-Forward 3D Panoptic Segmentation

Related Papers