Search papers, labs, and topics across Lattice.
2
0
5
0
LongCat-Next shatters the language-centric paradigm by unifying text, vision, and audio into a single autoregressive model with minimal modality-specific design, finally reconciling understanding and generation in discrete vision modeling.
Forget painstakingly curating 3D-consistent datasets: this RL approach uses a 3D foundation model to *verify* consistency, turning 3D scene editing into a tractable reinforcement learning problem.