Feb 24, 2026arXiv:2602.21100

Skullptor: High Fidelity 3D Head Reconstruction in Seconds with Multi-View Normal Prediction

Noé Artru, Rukhshanda Hussain, Emeline Got, Alexandre Messier, Abdallah Dib

AI Summary

The paper introduces Skullptor, a novel 3D head reconstruction method that combines multi-view normal prediction with inverse rendering optimization. It addresses the trade-off between efficiency and fidelity in existing methods by using a cross-view attention mechanism to extend monocular foundation models for geometrically consistent normal prediction. The predicted normals are then used as priors in an inverse rendering framework, enabling high-fidelity reconstruction from sparse views with reduced computational cost.

Key Contribution

Get photorealistic 3D head models rivaling dense photogrammetry, but with far fewer cameras and compute, thanks to a clever hybrid approach using multi-view normal prediction.

Abstract

Reconstructing high-fidelity 3D head geometry from images is critical for a wide range of applications, yet existing methods face fundamental limitations. Traditional photogrammetry achieves exceptional detail but requires extensive camera arrays (25-200+ views), substantial computation, and manual cleanup in challenging areas like facial hair. Recent alternatives present a fundamental trade-off: foundation models enable efficient single-image reconstruction but lack fine geometric detail, while optimization-based methods achieve higher fidelity but require dense views and expensive computation. We bridge this gap with a hybrid approach that combines the strengths of both paradigms. Our method introduces a multi-view surface normal prediction model that extends monocular foundation models with cross-view attention to produce geometrically consistent normals in a feed-forward pass. We then leverage these predictions as strong geometric priors within an inverse rendering optimization framework to recover high-frequency surface details. Our approach outperforms state-of-the-art single-image and multi-view methods, achieving high-fidelity reconstruction on par with dense-view photogrammetry while reducing camera requirements and computational cost. The code and model will be released.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Skullptor: High Fidelity 3D Head Reconstruction in Seconds with Multi-View Normal Prediction

Related Papers