Search papers, labs, and topics across Lattice.
5
0
7
6
By tightly coupling reasoning, searching, and generation, Unify-Agent achieves state-of-the-art world-grounded image synthesis, rivaling closed-source models and opening new avenues for agent-based multimodal generation.
Image generation takes a leap towards real-world knowledge by training an agent that actively searches for and integrates external information, substantially boosting performance on knowledge-intensive tasks.
LongCat-Next shatters the language-centric paradigm by unifying text, vision, and audio into a single autoregressive model with minimal modality-specific design, finally reconciling understanding and generation in discrete vision modeling.
Achieve photorealistic and structurally consistent weather editing for autonomous driving videos without the massive datasets typically required by generative models.
Representation-Pivoted Autoencoders enable diffusion models to generate and edit images with higher fidelity by learning a compressed latent space that preserves the semantics of pre-trained visual representations.