Search papers, labs, and topics across Lattice.
6
0
8
i1 not only matches the performance of leading text-to-image models but also sets a new standard for fully open models, outperforming the best by nearly 30 percentage points.
LCLMs redefine the efficiency of long-context inference, achieving superior compression without sacrificing model quality.
Even the top-performing MLLMs struggle with visual reasoning, achieving only 64% accuracy on a benchmark designed to reflect real-world diversity.
Forget complex architectures and task-specific designs: VLMs are already native 3D learners with the right training recipe.
VLMs can get a 10% boost in spatial reasoning and 3D understanding by training on just 10,000 synthetic images generated automatically from task keywords.
Open-sourcing Vero, a VLM trained with RL on a diverse 600K-sample dataset, closes the performance gap with proprietary models and reveals that broad task coverage, not just scale, is the key to unlocking general visual reasoning.