Search papers, labs, and topics across Lattice.
2
0
5
Multimodal models are often blind at birth: a new "Visual Attention Score" reveals they struggle to focus on visual inputs during cold-start, but a simple attention-guided fix can boost performance by 7%.
Text-to-image models wash away the unique stylistic fingerprints of their captioning counterparts, revealing a surprising disconnect between text and image generation.