Mar 31, 2026arXiv:2603.29291

MELT: Improve Composed Image Retrieval via the Modification Frequentation-Rarity Balance Network

AI Summary

The paper introduces MELT, a novel neural network architecture for Composed Image Retrieval (CIR) that addresses frequency bias and sensitivity to hard negative samples. MELT balances frequent and rare modification semantics by assigning increased attention to rare modification semantics in multimodal contexts. It also applies diffusion-based denoising to hard negative samples with high similarity scores, enhancing multimodal fusion and matching, leading to state-of-the-art performance on CIR benchmarks.

Key Contribution

Diffusion-based denoising can significantly improve composed image retrieval by making similarity scores more robust to hard negative samples.

Abstract

Composed Image Retrieval (CIR) uses a reference image and a modification text as a query to retrieve a target image satisfying the requirement of ``modifying the reference image according to the text instructions''. However, existing CIR methods face two limitations: (1) frequency bias leading to ``Rare Sample Neglect'', and (2) susceptibility of similarity scores to interference from hard negative samples and noise. To address these limitations, we confront two key challenges: asymmetric rare semantic localization and robust similarity estimation under hard negative samples. To solve these challenges, we propose the Modification frEquentation-rarity baLance neTwork MELT. MELT assigns increased attention to rare modification semantics in multimodal contexts while applying diffusion-based denoising to hard negative samples with high similarity scores, enhancing multimodal fusion and matching. Extensive experiments on two CIR benchmarks validate the superior performance of MELT. Codes are available at https://github.com/luckylittlezhi/MELT.

Computer Vision Multimodal Models Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MELT: Improve Composed Image Retrieval via the Modification Frequentation-Rarity Balance Network

Related Papers