Search papers, labs, and topics across Lattice.
Tencent Hunyuan
6
32
7
4
Frame-level causal attention is all you need for effective visual reconstruction in unified multimodal models.
Current audio-visual models nail unimodal quality but still struggle to make music and dance move together rhythmically, highlighting a key gap TMD-Bench is designed to address.
LMMs can learn to generate images *and* improve their understanding abilities, without catastrophic forgetting, by carefully disentangling and sharing experts within a MoE architecture.
Achieve SOTA in both visual generation and understanding by harmonizing generative and semantic representations within a single ViT architecture.
Ditch discrete visual tokens: UniCom achieves SOTA multimodal generation by compressing continuous semantic representations, unlocking better controllability and consistency in image editing.
The largest open-source image generative model to date, HunyuanImage 3.0, achieves state-of-the-art performance using a Mixture-of-Experts architecture and native Chain-of-Thoughts schema.