Search papers, labs, and topics across Lattice.
The paper addresses the challenge of generating first-person (ego) videos from synchronized third-person (exo) videos, where discontinuities between views hinder existing video generation models. They propose Syn2Seq-Forcing, a method that interpolates between exo and ego videos to create a continuous sequence suitable for diffusion-based sequence models. This approach significantly improves the coherence of generated videos by mitigating the impact of spatio-temporal jumps.
Bridging the gap between third-person and first-person video generation is as simple as interpolating the videos, revealing that spatio-temporal discontinuities are the real bottleneck.
Exo-to-Ego video generation aims to synthesize a first-person video from a synchronized third-person view and corresponding camera poses. While paired supervision is available, synchronized exo-ego data inherently introduces substantial spatio-temporal and geometric discontinuities, violating the smooth-motion assumptions of standard video generation benchmarks. We identify this synchronization-induced jump as the central challenge and propose Syn2Seq-Forcing, a sequential formulation that interpolates between the source and target videos to form a single continuous signal. By reframing Exo2Ego as sequential signal modeling rather than a conventional condition-output task, our approach enables diffusion-based sequence models, e.g. Diffusion Forcing Transformers (DFoT), to capture coherent transitions across frames more effectively. Empirically, we show that interpolating only the videos, without performing pose interpolation already produces significant improvements, emphasizing that the dominant difficulty arises from spatio-temporal discontinuities. Beyond immediate performance gains, this formulation establishes a general and flexible framework capable of unifying both Exo2Ego and Ego2Exo generation within a single continuous sequence model, providing a principled foundation for future research in cross-view video synthesis.