Search papers, labs, and topics across Lattice.
MT-EditFlow bridges the gap between local planning and global success in multi-turn image editing, achieving a significant performance boost over leading models.
Forget training: PinPoint, a training-free point selector, closes the performance gap between zero-shot VLMs and fine-tuned specialists in referring image segmentation by intelligently choosing interior points for prompting SAM.
Text-to-video generation gets a 1.58x speed boost with CalibAtt, a training-free method that exploits consistent sparsity patterns in attention layers.
RL fine-tuning can make vision-language models *less* reliable reasoners, as gains in benchmark accuracy come at the cost of faithfulness to the underlying visual grounding and chain-of-thought.