Search papers, labs, and topics across Lattice.
Institute of Information Engineering, Chinese Academy of Sciences
3
0
7
0
Unifying multimodal AI architectures doesn't just boost performance; it also dramatically degrades safety, especially in open-source models.
Forget tedious fine-tuning: leveraging molecule identifiers as visual prompts unlocks surprisingly powerful zero-shot chemical reaction diagram parsing in VLMs.
Text-only foundation models can perform surprisingly well on complex 3D spatial reasoning tasks, rivaling multimodal models, when equipped with a structured spatial representation derived from 3D reconstruction.