Search papers, labs, and topics across Lattice.
McGill University
3
0
5
6
Endowing VLMs with intrinsic 3D geometric awareness and physical interaction cues via XEmbodied substantially boosts performance on spatial reasoning and embodied tasks, surpassing existing 2D image-text pretrained models.
Current video editing AIs still struggle to balance visual quality, instruction adherence, and localized edits, as revealed by a new benchmark designed to disentangle these factors.
VLMs can now get a million-scale boost in chart-understanding abilities thanks to a new dataset with paired code, images, data, and reasoning.