Mar 16, 2026arXiv:2603.15603

Fast SAM 3D Body: Accelerating SAM 3D Body for Real-Time Full-Body Human Mesh Recovery

Timing Yang, Sicheng He, Hongyi Jing, Jiawei Yang, Zhijian Liu, Chuhang Zou, Yue Wang

AI Summary

Fast SAM 3D Body accelerates the SAM 3D Body framework for monocular 3D human mesh recovery by decoupling spatial dependencies and applying architecture-aware pruning for parallelized feature extraction and streamlined transformer decoding. It replaces iterative mesh fitting with a direct feedforward mapping for joint-level kinematics extraction, achieving a 10,000x speedup in this conversion. The resulting framework achieves up to 10.9x end-to-end speedup while maintaining or surpassing the original SAM 3D Body's reconstruction fidelity.

Key Contribution

Achieve real-time full-body human mesh recovery from a single RGB stream with Fast SAM 3D Body, a 10x speedup over the original without sacrificing accuracy.

Abstract

SAM 3D Body (3DB) achieves state-of-the-art accuracy in monocular 3D human mesh recovery, yet its inference latency of several seconds per image precludes real-time application. We present Fast SAM 3D Body, a training-free acceleration framework that reformulates the 3DB inference pathway to achieve interactive rates. By decoupling serial spatial dependencies and applying architecture-aware pruning, we enable parallelized multi-crop feature extraction and streamlined transformer decoding. Moreover, to extract the joint-level kinematics (SMPL) compatible with existing humanoid control and policy learning frameworks, we replace the iterative mesh fitting with a direct feedforward mapping, accelerating this specific conversion by over 10,000x. Overall, our framework delivers up to a 10.9x end-to-end speedup while maintaining on-par reconstruction fidelity, even surpassing 3DB on benchmarks such as LSPET. We demonstrate its utility by deploying Fast SAM 3D Body in a vision-only teleoperation system that-unlike methods reliant on wearable IMUs-enables real-time humanoid control and the direct collection of manipulation policies from a single RGB stream.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Fast SAM 3D Body: Accelerating SAM 3D Body for Real-Time Full-Body Human Mesh Recovery

Related Papers