Search papers, labs, and topics across Lattice.
1
2
8
Stable Diffusion can serve as a surprisingly effective, instruction-aware visual encoder for MLLMs, outperforming CLIP on tasks requiring spatial and compositional reasoning.