Search papers, labs, and topics across Lattice.
4
0
6
Current multimodal math models struggle with visual interpretation, symbol alignment, and consistent reasoning, highlighting the need for a unified "Perception-Alignment-Reasoning" framework.
By generating and compressing Chain-of-Thought reasoning, TRACE achieves state-of-the-art multimodal retrieval and learns to adaptively route queries through reasoning only when necessary, balancing accuracy and efficiency.
WildActor generates realistic human videos from any viewpoint while strictly preserving identity, a feat that existing methods struggle with due to face-centric behavior or copy-paste artifacts.
Training-free zero-shot image retrieval just got a whole lot better: WISER's "retrieve-verify-refine" pipeline achieves state-of-the-art results by intelligently fusing text-to-image and image-to-image retrieval.