Search papers, labs, and topics across Lattice.
2
0
4
5
Today's visual generation models are often evaluated on the wrong things, leading to inflated performance claims that mask critical failures in spatial reasoning, temporal consistency, and causal understanding.
VLMs aren't using 3D geometry tokens effectively for spatial reasoning, but a simple masking and gated fusion strategy can unlock significant performance gains.