Search papers, labs, and topics across Lattice.
This paper introduces the HABIT framework, which addresses the Noise Triplet Correspondence (NTC) problem in Composed Image Retrieval (CIR) by implementing a two-module system that enhances the robustness of image retrieval under noisy conditions. The Mutual Knowledge Estimation Module effectively identifies clean samples by analyzing the mutual information transition rates, while the Dual-consistency Progressive Learning Module mimics human habit formation to improve model adaptation and learning. Experimental results on standard CIR datasets show that HABIT significantly outperforms existing methods, demonstrating its effectiveness in handling noise and improving retrieval accuracy.
HABIT achieves superior image retrieval performance by simulating human habit formation, effectively tackling the Noise Triplet Correspondence problem that plagues traditional methods.
Composed Image Retrieval (CIR) is a flexible image retrieval paradigm that enables users to accurately locate the target image through a multimodal query composed of a reference image and modification text. Although this task has demonstrated promising applications in personalized search and recommendation systems, it encounters a severe challenge in practical scenarios known as the Noise Triplet Correspondence (NTC) problem. This issue primarily arises from the high cost and subjectivity involved in annotating triplet data. To address this problem, we identify two central challenges: the precise estimation of composed semantic discrepancy and the insufficient progressive adaptation to modification discrepancy. To tackle these challenges, we propose a cHrono-synergiA roBust progressIve learning framework for composed image reTrieval (HABIT), which consists of two core modules. First, the Mutual Knowledge Estimation Module quantifies sample cleanliness by calculating the Transition Rate of mutual information between the composed feature and the target image, thereby effectively identifying clean samples that align with the intended modification semantics. Second, the Dual-consistency Progressive Learning Module introduces a collaborative mechanism between the historical and current models, simulating human habit formation to retain good habits and calibrate bad habits, ultimately enabling robust learning under the presence of NTC. Extensive experiments conducted on two standard CIR datasets demonstrate that HABIT significantly outperforms most methods under various noise ratios, exhibiting superior robustness and retrieval performance. Codes are available at https://github.com/Lee-zixu/HABIT