TeleAIUSTCApr 16, 2026arXiv:2604.14910

Reward-Aware Trajectory Shaping for Few-step Visual Generation

Rui Li, Bingyu Li, Yuanzhi Liang, HuangHai Bin, Chi Zhang, Xuelong Li

AI Summary

This paper introduces Reward-Aware Trajectory Shaping (RATS), a novel framework for few-step visual generation that moves beyond imitation learning by incorporating preference alignment awareness. RATS aligns teacher and student latent trajectories using horizon matching and a reward-aware gate that adaptively regulates teacher guidance based on reward performance. Experiments show that RATS significantly improves the efficiency-quality trade-off in few-step visual generation by enabling the student to surpass the teacher's performance.

Key Contribution

Forget imitation: reward-aware trajectory shaping lets few-step generative models outperform their multi-step teachers.

Abstract

Achieving high-fidelity generation in extremely few sampling steps has long been a central goal of generative modeling. Existing approaches largely rely on distillation-based frameworks to compress the original multi-step denoising process into a few-step generator. However, such methods inherently constrain the student to imitate a stronger multi-step teacher, imposing the teacher as an upper bound on student performance. We argue that introducing \textbf{preference alignment awareness} enables the student to optimize toward reward-preferred generation quality, potentially surpassing the teacher instead of being restricted to rigid teacher imitation. To this end, we propose \textbf{Reward-Aware Trajectory Shaping (RATS)}, a lightweight framework for preference-aligned few-step generation. Specifically, teacher and student latent trajectories are aligned at key denoising stages through horizon matching, while a \textbf{reward-aware gate} is introduced to adaptively regulate teacher guidance based on their relative reward performance. Trajectory shaping is strengthened when the teacher achieves higher rewards, and relaxed when the student matches or surpasses the teacher, thereby enabling continued reward-driven improvement. By seamlessly integrating trajectory distillation, reward-aware gating, and preference alignment, RATS effectively transfers preference-relevant knowledge from high-step generators without incurring additional test-time computational overhead. Experimental results demonstrate that RATS substantially improves the efficiency--quality trade-off in few-step visual generation, significantly narrowing the gap between few-step students and stronger multi-step generators.

Computer Vision Multimodal Models Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Reward-Aware Trajectory Shaping for Few-step Visual Generation

Related Papers