Search papers, labs, and topics across Lattice.
ShanghaiTech University
2
0
3
9
Achieve realistic video object insertion by lifting 2D images into multi-view representations, ensuring consistent appearance and handling occlusions in dynamic scenes.
Diffusion models, guided by vision-language models and geometric constraints, can now reconstruct realistic 3D human interactions from monocular video, even with severe occlusions.