Search papers, labs, and topics across Lattice.
The University of Sydney
2
0
5
0
Task-aware localization, using attention cues from both source and target image streams, significantly reduces over-editing in instruction-based image editing, even when applied to strong diffusion transformer backbones.
MLLMs can achieve 4x faster inference without sacrificing accuracy by intelligently focusing on only the image regions relevant to the query.