Search papers, labs, and topics across Lattice.
Tencent Hunyuan
8
0
7
5
Frame-level causal attention is all you need for effective visual reconstruction in unified multimodal models.
Spatial-Omni achieves superior spatial audio understanding by seamlessly integrating FOA encoding into existing LLMs, outperforming traditional models without compromising general audio processing.
Flow-DPPO outperforms traditional PPO methods by achieving higher rewards and greater training stability through a novel divergence proximal constraint.
Smooth gradient adjustments in DRPO prevent harmful policy shifts, leading to more stable and efficient LLM training.
Current audio editing models are failing spectacularly, with an Exact Match Rate below 5% in complex tasks, exposing a critical need for improvement.
LMMs can learn to generate images *and* improve their understanding abilities, without catastrophic forgetting, by carefully disentangling and sharing experts within a MoE architecture.
Achieve SOTA in both visual generation and understanding by harmonizing generative and semantic representations within a single ViT architecture.
Ditch discrete visual tokens: UniCom achieves SOTA multimodal generation by compressing continuous semantic representations, unlocking better controllability and consistency in image editing.