Search papers, labs, and topics across Lattice.
Aalto University
1
0
3
2
VLMs struggle to align assembly diagrams and videos because they occupy disjoint visual representation spaces, revealing a fundamental limitation in cross-modal understanding.