Search papers, labs, and topics across Lattice.
Central South University, Meituan Inc
2
0
5
Current MLLMs are still surprisingly reliant on textual reasoning, even when visual information is crucial for solving STEM problems.
LongCat-Next shatters the language-centric paradigm by unifying text, vision, and audio into a single autoregressive model with minimal modality-specific design, finally reconciling understanding and generation in discrete vision modeling.