Apr 20, 2026arXiv:2604.18037

HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval

AI Summary

This paper introduces the HABIT framework, which addresses the Noise Triplet Correspondence (NTC) problem in Composed Image Retrieval (CIR) by implementing a two-module system that enhances the robustness of image retrieval under noisy conditions. The Mutual Knowledge Estimation Module effectively identifies clean samples by analyzing the mutual information transition rates, while the Dual-consistency Progressive Learning Module mimics human habit formation to improve model adaptation and learning. Experimental results on standard CIR datasets show that HABIT significantly outperforms existing methods, demonstrating its effectiveness in handling noise and improving retrieval accuracy.

Key Contribution

HABIT achieves superior image retrieval performance by simulating human habit formation, effectively tackling the Noise Triplet Correspondence problem that plagues traditional methods.

Abstract

Composed Image Retrieval (CIR) is a flexible image retrieval paradigm that enables users to accurately locate the target image through a multimodal query composed of a reference image and modification text. Although this task has demonstrated promising applications in personalized search and recommendation systems, it encounters a severe challenge in practical scenarios known as the Noise Triplet Correspondence (NTC) problem. This issue primarily arises from the high cost and subjectivity involved in annotating triplet data. To address this problem, we identify two central challenges: the precise estimation of composed semantic discrepancy and the insufficient progressive adaptation to modification discrepancy. To tackle these challenges, we propose a cHrono-synergiA roBust progressIve learning framework for composed image reTrieval (HABIT), which consists of two core modules. First, the Mutual Knowledge Estimation Module quantifies sample cleanliness by calculating the Transition Rate of mutual information between the composed feature and the target image, thereby effectively identifying clean samples that align with the intended modification semantics. Second, the Dual-consistency Progressive Learning Module introduces a collaborative mechanism between the historical and current models, simulating human habit formation to retain good habits and calibrate bad habits, ultimately enabling robust learning under the presence of NTC. Extensive experiments conducted on two standard CIR datasets demonstrate that HABIT significantly outperforms most methods under various noise ratios, exhibiting superior robustness and retrieval performance. Codes are available at https://github.com/Lee-zixu/HABIT

Computer Vision Multimodal Models Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval

Related Papers