Mar 16, 2026arXiv:2603.15433

Real-Time Human Frontal View Synthesis from a Single Image

Fangyu Lin, Yingdong Hu, Lunjie Zhu, Zhening Liu, Yushi Huang, Zehong Lin, Jun Zhang

AI Summary

PrismMirror is introduced, a novel geometry-guided framework for real-time frontal view synthesis of humans from a single image, designed to overcome limitations of existing rendering-centric and human-centric approaches. The method employs a cascade learning strategy for coarse-to-fine geometric feature learning, directly estimating coarse geometric features like SMPL-X meshes and point clouds before refining textures through rendering supervision. By distilling the framework into a lightweight linear attention model, PrismMirror achieves real-time inference at 24 FPS, demonstrating superior visual authenticity and structural accuracy compared to previous methods.

Key Contribution

Ditch the multi-camera setup: PrismMirror synthesizes photorealistic frontal views of humans from a single image in real-time, outperforming prior methods in both visual quality and geometric accuracy.

Abstract

Photorealistic human novel view synthesis from a single image is crucial for democratizing immersive 3D telepresence, eliminating the need for complex multi-camera setups. However, current rendering-centric methods prioritize visual fidelity over explicit geometric understanding and struggle with intricate regions like faces and hands, leading to temporal instability. Meanwhile, human-centric frameworks suffer from memory bottlenecks since they typically rely on an auxiliary model to provide informative structural priors for geometric modeling, which limits real-time performance. To address these challenges, we propose PrismMirror, a geometry-guided framework for instant frontal view synthesis from a single image. By avoiding external geometric modeling and focusing on frontal view synthesis, our model optimizes visual integrity for telepresence. Specifically, PrismMirror introduces a novel cascade learning strategy that enables coarse-to-fine geometric feature learning. It first directly learns coarse geometric features, such as SMPL-X meshes and point clouds, and then refines textures through rendering supervision. To achieve real-time efficiency, we distill this unified framework into a lightweight linear attention model. Notably, PrismMirror is the first monocular human frontal view synthesis model that achieves real-time inference at 24 FPS, significantly outperforming previous methods in both visual authenticity and structural accuracy.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Real-Time Human Frontal View Synthesis from a Single Image

Related Papers