Search papers, labs, and topics across Lattice.
Munich Center for Machine Learning
6
20
5
20
MLLMs' image segmentation prowess isn't a given: a critical adapter layer actually *hurts* performance, with the LLM having to recover via attention-mediated refinement.
MLLMs are surprisingly prone to hallucinating subtle details, especially when asked about the absence of specific attributes or relationships within an image.
Unlock precise, training-free color control in text-to-image models by directly manipulating the latent space's emergent Hue, Saturation, and Lightness structure.
You can now audit black-box vision models for biases and failure modes using only their output probabilities, thanks to a clever LLM-powered semantic search.
Achieve meaningful vision-language model alignment with significantly less supervision by leveraging unpaired data via optimal transport.
Unlocking VLM interpretability, sparse autoencoders let you directly steer multimodal LLMs like LLaVA by intervening on CLIP's vision encoder.