Search papers, labs, and topics across Lattice.
3 papers from Allen Institute for AI (AI2) on Multimodal Models
Time is a learnable visual concept: models can now reason about and manipulate the flow of time in videos, opening doors to temporally controllable video generation and temporal forensics.
LG's EXAONE 4.5 shows that strategically curating training data, particularly document-centric corpora, unlocks substantial gains in specialized tasks like document understanding and Korean contextual reasoning, even while maintaining competitive general performance.
Forget training on closed sets: WildDet3D leverages geometric cues and diverse prompts to achieve SOTA 3D object detection across 13.5K categories in the wild.