Search papers, labs, and topics across Lattice.
School of AI, Shanghai Jiao Tong University, Kling Team, Kuaishou Technology
1
0
3
Ditching text-based chain-of-thought unlocks better audio-visual reasoning by interleaving textual steps with a unified latent space that preserves dense sensory information.