Search papers, labs, and topics across Lattice.
FaceCam is introduced, a system for generating portrait videos with customizable camera trajectories from monocular video input. It addresses geometric distortions and artifacts common in existing video generation models by using a face-tailored scale-aware representation for camera transformations, avoiding reliance on 3D priors. The video generation model is trained on both multi-view studio captures and in-the-wild monocular videos, using synthetic camera motion and multi-shot stitching for data augmentation.
Ditch the shaky cams: FaceCam lets you create smooth, customizable portrait videos from a single camera input, sidestepping common 3D reconstruction errors.
We introduce FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input. Recent camera control approaches based on large video-generation models have shown promising progress but often exhibit geometric distortions and visual artifacts on portrait videos due to scale-ambiguous camera representations or 3D reconstruction errors. To overcome these limitations, we propose a face-tailored scale-aware representation for camera transformations that provides deterministic conditioning without relying on 3D priors. We train a video generation model on both multi-view studio captures and in-the-wild monocular videos, and introduce two camera-control data generation strategies: synthetic camera motion and multi-shot stitching, to exploit stationary training cameras while generalizing to dynamic, continuous camera trajectories at inference time. Experiments on Ava-256 dataset and diverse in-the-wild videos demonstrate that FaceCam achieves superior performance in camera controllability, visual quality, identity and motion preservation.