Search papers, labs, and topics across Lattice.
Tencent Hunyuan
4
32
7
4
Frame-level causal attention is all you need for effective visual reconstruction in unified multimodal models.
Achieve real-time video understanding with transparent reasoning: \model{} aligns response timing with visual evidence, offering a breakthrough for online video LLMs.
By "dreaming ahead" with learned latent visual dynamics, LatentPilot achieves state-of-the-art vision-and-language navigation, demonstrating the power of future-aware reasoning without needing future observations at test time.
The largest open-source image generative model to date, HunyuanImage 3.0, achieves state-of-the-art performance using a Mixture-of-Experts architecture and native Chain-of-Thoughts schema.