USTCMar 17, 2026arXiv:2603.16868

MessyKitchens: Contact-rich object-level 3D scene reconstruction

J. Ansari, Junaid Ahmed Ansari, Ran Ding, Fabio Pizzati, Ivan Laptev

AI Summary

The paper introduces MessyKitchens, a new dataset of real-world cluttered kitchen scenes with high-fidelity 3D object annotations including shapes, poses, and contacts. They also propose a Multi-Object Decoder (MOD) that extends the SAM 3D approach for joint object-level scene reconstruction, explicitly addressing object contacts and penetrations. Experiments demonstrate that MessyKitchens improves registration accuracy and reduces inter-object penetration compared to existing datasets, and MOD achieves state-of-the-art multi-object reconstruction performance across three datasets.

Key Contribution

Reconstructing realistic 3D scenes with accurate object contacts is now possible thanks to the new MessyKitchens dataset and the Multi-Object Decoder (MOD).

Abstract

Monocular 3D scene reconstruction has recently seen significant progress. Powered by the modern neural architectures and large-scale data, recent methods achieve high performance in depth estimation from a single image. Meanwhile, reconstructing and decomposing common scenes into individual 3D objects remains a hard challenge due to the large variety of objects, frequent occlusions and complex object relations. Notably, beyond shape and pose estimation of individual objects, applications in robotics and animation require physically-plausible scene reconstruction where objects obey physical principles of non-penetration and realistic contacts. In this work we advance object-level scene reconstruction along two directions. First, we introduceMessyKitchens, a new dataset with real-world scenes featuring cluttered environments and providing high-fidelity object-level ground truth in terms of 3D object shapes, poses and accurate object contacts. Second, we build on the recent SAM 3D approach for single-object reconstruction and extend it with Multi-Object Decoder (MOD) for joint object-level scene reconstruction. To validate our contributions, we demonstrate MessyKitchens to significantly improve previous datasets in registration accuracy and inter-object penetration. We also compare our multi-object reconstruction approach on three datasets and demonstrate consistent and significant improvements of MOD over the state of the art. Our new benchmark, code and pre-trained models will become publicly available on our project website: https://messykitchens.github.io/.

Computer Vision Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References63

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MessyKitchens: Contact-rich object-level 3D scene reconstruction

Related Papers