Search papers, labs, and topics across Lattice.
5
0
9
9
Ditch the feature extraction pipeline: GenMask directly generates segmentation masks with a diffusion transformer, achieving SOTA results by harmonizing mask and image generation in a single model.
Forget real-world video datasets: training VLMs on just 7.7K synthetic videos with temporal primitives beats 165K real-world examples, unlocking surprisingly effective transfer learning for video reasoning.
Multi-hop data synthesis using HopChain boosts VLM performance across a wide range of tasks, with gains of over 50 points in accuracy for ultra-long-context reasoning.
Forget scaling reasoning – this work shows that scaling visual perception using code-grounded data is the real key to unlocking MLLMs' STEM abilities.
A new family of GUI agents, GUI-Owl-1.5, leapfrogs existing open-source models on 20+ GUI benchmarks, proving that multi-platform, real-time GUI automation is now within reach.