Search papers, labs, and topics across Lattice.
4
0
7
0
Bidirectional interaction between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables a unified multimodal model to achieve spatial intelligence beyond general visual competence.
The fragmented field of world modeling can now be unified under a "levels x laws" taxonomy, revealing critical gaps in autonomous model revision and decision-centric evaluation.
Existing image editing models fall short when it comes to precise spatial manipulations, but a new benchmark and dataset reveal the path to closing the gap.
End-to-end MLLMs struggle with visual reasoning, but a program synthesis approach that explicitly represents compositional logic dramatically improves accuracy and transparency.