KAISTUTokyoJun 16, 2026arXiv:2606.18066

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

Jisung Hwang, Yunhong Min, Jaihoon Kim, I-Chao Shen, Minhyuk Sung

AI Summary

This paper introduces the Noise-Tilted Reverse Kernel (NTRK), a novel reward-guided diffusion sampler that effectively integrates reward gradients into the noise term while maintaining the integrity of the pretrained reverse kernel. By employing a whitening operator, NTRK allows for safe injection of reward gradients, overcoming the limitations of existing methods that either degrade sample quality or lack gradient guidance. The results demonstrate that NTRK significantly outperforms state-of-the-art baselines in various reward alignment tasks, achieving superior aesthetic generation with a remarkable 20× reduction in computational effort.

Key Contribution

NTRK achieves a 20× reduction in compute while surpassing the best baseline in aesthetic generation, revolutionizing reward-guided diffusion sampling.

Abstract

We introduce the Noise-Tilted Reverse Kernel (NTRK), a reward-guided diffusion sampler that injects reward gradients through the noise term, leaving the pretrained reverse kernel unchanged and requiring only a single sample per step. Reward-guided sampling at inference time has greatly expanded the versatility of pretrained diffusion models. Yet existing methods face a trade-off. Gradient-based guidance shifts the reverse mean, steering generation but pushing intermediate states outside the region that the model was trained on and degrading quality. Search-based methods preserve quality but gain no gradient signal. No prior method achieves both. NTRK resolves this by keeping the reverse mean fixed and biasing the noise term toward high reward. We introduce a whitening operator, the central mechanism behind NTRK, that makes the reward gradient safe to inject as noise without losing its guiding signal. Across various reward alignment tasks, NTRK outperforms recent state-of-the-art baselines without losing sample quality. Remarkably, on aesthetic generation, NTRK surpasses the reward of the best baseline at 500 NFEs using only 25 NFEs, a 20$\times$ reduction in compute.

RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...