Search papers, labs, and topics across Lattice.
2
20
4
5
Achieve meaningful vision-language model alignment with significantly less supervision by leveraging unpaired data via optimal transport.
Unlocking VLM interpretability, sparse autoencoders let you directly steer multimodal LLMs like LLaVA by intervening on CLIP's vision encoder.