Search papers, labs, and topics across Lattice.
Fudan University
2
0
3
HiCoDiT achieves superior audio-visual alignment by harnessing the hierarchical nature of speech tokens, outperforming traditional VTS methods in both fidelity and expressiveness.
Achieve significantly more realistic and lip-synced movie dubbing by modeling the cognitive processes of professional actors with a novel diffusion transformer architecture.