Search papers, labs, and topics across Lattice.
This paper introduces a feedback reinforcement framework to align infrared and visible image fusion (IVIF) with human visual preferences. They construct a large-scale human feedback dataset for IVIF, containing subjective scores and artifact annotations, enhanced by a fine-tuned large language model. Using this dataset, they train a reward model to quantify perceptual quality and fine-tune the fusion network via Group Relative Policy Optimization, achieving state-of-the-art performance in human perceptual alignment.
Forget handcrafted losses: this paper uses human feedback and reinforcement learning to create infrared and visible image fusion that actually looks good to people.
Infrared and visible image fusion (IVIF) integrates complementary modalities to enhance scene perception. Current methods predominantly focus on optimizing handcrafted losses and objective metrics, often resulting in fusion outcomes that do not align with human visual preferences. This challenge is further exacerbated by the ill-posed nature of IVIF, which severely limits its effectiveness in human perceptual environments such as security surveillance and driver assistance systems. To address these limitations, we propose a feedback reinforcement framework that bridges human evaluation to infrared and visible image fusion. To address the lack of human-centric evaluation metrics and data, we introduce the first large-scale human feedback dataset for IVIF, containing multidimensional subjective scores and artifact annotations, and enriched by a fine-tuned large language model with expert review. Based on this dataset, we design a domain-specific reward function and train a reward model to quantify perceptual quality. Guided by this reward, we fine-tune the fusion network through Group Relative Policy Optimization, achieving state-of-the-art performance that better aligns fused images with human aesthetics. Code is available at https://github.com/ALKA-Wind/EVAFusion.