Search papers, labs, and topics across Lattice.
2
0
3
Counterintuitively, cropping and resizing a region of interest before refinement dramatically improves the fidelity of local detail restoration in diffusion models, enabling near-perfect background preservation.
Latent visual reasoning in multimodal LLMs is largely ineffective, as the "imagination" happening in latent space doesn't actually attend to the input or influence the output, making explicit text-based imagination a surprisingly better alternative.