Search papers, labs, and topics across Lattice.
3
0
6
3
GUI agents struggle with long tasks not because they mis-click, but because they forget what they were doing, and a new "anchored memory" method can fix it.
Video fine-tuning boosts MLLMs' video smarts, but surprisingly dumbs them down on static images – a trade-off you can't simply brute-force away with more frames.
Ditch the pixel-level rendering and external executors: LatentGeo learns continuous latent visual representations to internalize auxiliary geometric constructions for multimodal geometric reasoning, boosting performance on complex geometry problems.