Search papers, labs, and topics across Lattice.
1
0
3
4
Multimodal models can "see" the image but still fail at reasoning because the visual input distracts the routing mechanism from activating the right experts.