Search papers, labs, and topics across Lattice.
Terminal Technology Department, Alipay, Ant Group Code: https://github.com/AutoLab-SAI-SJTU/QVLA Equal contribution.
1
0
3
3
Achieve real-time, synchronized audio-visual generation at 25 FPS by distilling a bidirectional diffusion model into a fast, autoregressive architecture, overcoming training instability with novel alignment and token handling techniques.