Search papers, labs, and topics across Lattice.
2
0
4
Unified multimodal models often *hurt* performance on multimodal understanding tasks, except for spatial reasoning, visual illusions, and multi-round reasoning, challenging the assumption that generation universally improves understanding.
Turns out, skipping the boring parts of a video (like static backgrounds) makes your vision AI both faster and smarter, beating state-of-the-art models with less data.