Search papers, labs, and topics across Lattice.
QualiTeacher addresses the challenge of noisy pseudo-labels in real-world image restoration by conditioning the student model on pseudo-label quality, estimated via an ensemble of non-reference image quality assessment (NR-IQA) models. This allows the student to learn a quality-graded restoration manifold, avoiding the imitation of artifacts from low-quality labels and enabling extrapolation to higher quality results. The framework incorporates multi-augmentation, score-based preference optimization (inspired by DPO), and a cropped consistency loss to ensure robustness and prevent adversarial optimization.
Instead of discarding noisy pseudo-labels in image restoration, QualiTeacher leverages them by teaching the model to understand and even surpass the quality levels they represent.
Real-world image restoration (RWIR) is a highly challenging task due to the absence of clean ground-truth images. Many recent methods resort to pseudo-label (PL) supervision, often within a Mean-Teacher (MT) framework. However, these methods face a critical paradox: unconditionally trusting the often imperfect, low-quality PLs forces the student model to learn undesirable artifacts, while discarding them severely limits data diversity and impairs model generalization. In this paper, we propose QualiTeacher, a novel framework that transforms pseudo-label quality from a noisy liability into a conditional supervisory signal. Instead of filtering, QualiTeacher explicitly conditions the student model on the quality of the PLs, estimated by an ensemble of complementary non-reference image quality assessment (NR-IQA) models spanning low-level distortion and semantic-level assessment. This strategy teaches the student network to learn a quality-graded restoration manifold, enabling it to understand what constitutes different quality levels. Consequently, it can not only avoid mimicking artifacts from low-quality labels but also extrapolate to generate results of higher quality than the teacher itself. To ensure the robustness and accuracy of this quality-driven learning, we further enhance the process with a multi-augmentation scheme to diversify the PL quality spectrum, a score-based preference optimization strategy inspired by Direct Preference Optimization (DPO) to enforce a monotonically ordered quality separation, and a cropped consistency loss to prevent adversarial over-optimization (reward hacking) of the IQA models. Experiments on standard RWIR benchmarks demonstrate that QualiTeacher can serve as a plug-and-play strategy to improve the quality of the existing pseudo-labeling framework, establishing a new paradigm for learning from imperfect supervision. Code will be released.