Search papers, labs, and topics across Lattice.
6 papers from Allen Institute for AI (AI2) on Multimodal Models
Forget redrawing diagrams by hand: VFIG, a new vision-language model, can automatically convert rasterized figures into editable SVGs with near GPT-5.2 quality.
Pixel-space diffusion models get a serious boost: V-Co reveals a simple recipe for visual co-denoising that outperforms existing methods on ImageNet-256 with fewer training epochs.
Training on SciMDR, a new 300K QA dataset synthesized from scientific papers, substantially boosts model performance on complex, document-level scientific reasoning tasks.
Finally, AI can generate hour-long videos with consistent characters and backgrounds, thanks to a new framework that nails seamless transitions between shots.
VLMs that ace math problems still flunk at understanding *how* students go wrong, highlighting a critical gap for AI in education.
Scaling VLMs won't magically unlock reasoning skills; you need to address the reporting bias in training data that suppresses tacit information.