Search papers, labs, and topics across Lattice.
University of Chinese Academy of Sciences, Beijing, China
2
0
4
1
Forget text-dominance: Today's Omni-modal LLMs surprisingly favor visual inputs, creating new challenges for cross-modal reasoning.
By grounding reflection in the visual artifacts of presentation slides, DeepPresenter enables agents to iteratively refine presentations in a way that internal reasoning traces alone cannot.