Search papers, labs, and topics across Lattice.
ONE-SHOT is introduced, a framework for compositional human-environment video generation that factorizes the generative process into disentangled signals. It decouples human dynamics from environmental cues via a canonical-space injection mechanism with cross-attention and establishes spatial correspondences between domains using Dynamic-Grounded-RoPE. A Hybrid Context Integration mechanism maintains subject and scene consistency across long-horizon synthesis, outperforming SOTA methods in structural control and creative diversity.
Achieve fine-grained control and creative flexibility in human-environment video synthesis without heavy 3D pre-processing, thanks to a novel spatial-decoupled motion injection technique.
Recent advances in Video Foundation Models (VFMs) have revolutionized human-centric video synthesis, yet fine-grained and independent editing of subjects and scenes remains a critical challenge. Recent attempts to incorporate richer environment control through rigid 3D geometric compositions often encounter a stark trade-off between precise control and generative flexibility. Furthermore, the heavy 3D pre-processing still limits practical scalability. In this paper, we propose ONE-SHOT, a parameter-efficient framework for compositional human-environment video generation. Our key insight is to factorize the generative process into disentangled signals. Specifically, we introduce a canonical-space injection mechanism that decouples human dynamics from environmental cues via cross-attention. We also propose Dynamic-Grounded-RoPE, a novel positional embedding strategy that establishes spatial correspondences between disparate spatial domains without any heuristic 3D alignments. To support long-horizon synthesis, we introduce a Hybrid Context Integration mechanism to maintain subject and scene consistency across minute-level generations. Experiments demonstrate that our method significantly outperforms state-of-the-art methods, offering superior structural control and creative diversity for video synthesis. Our project has been available on: https://martayang.github.io/ONE-SHOT/.