China University of Mining and TechnologyOPPOApr 20, 2026arXiv:2604.17850

UniCSG: Unified High-Fidelity Content-Constrained Style-Driven Generation via Staged Semantic and Frequency Disentanglement

Jingwei Yang, Ruoxi Wu, Wei Shen, Meng Li, Yulong Liu, Huimin She, Lunxi Yuan

AI Summary

The paper introduces UniCSG, a diffusion-based framework for style transfer that addresses content-style entanglement by employing staged training. The first stage uses low-frequency preprocessing and conditioning corruption to achieve semantic disentanglement in the latent space. The second stage refines details with multi-scale frequency supervision, further enhanced by pixel-space reward learning to improve perceptual quality.

Key Contribution

Achieve high-fidelity style transfer without content leakage by disentangling semantics and frequencies in the latent space of diffusion models.

Abstract

Style transfer must match a target style while preserving content semantics. DiT-based diffusion models often suffer from content-style entanglement, leading to reference-content leakage and unstable generation. We present UniCSG, a unified framework for content-constrained, style-driven generation in both text-guided and reference-guided settings. UniCSG employs staged training: (i) a latent-space semantic disentanglement stage that combines low-frequency preprocessing with conditioning corruption to encourage content-style separation, and (ii) a latent-space frequency-aware detail reconstruction stage that refines details via multi-scale frequency supervision. We further incorporate pixel-space reward learning to align latent objectives with perceptual quality after decoding. Experiments demonstrate improved content faithfulness, style alignment, and robustness in both settings.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

UniCSG: Unified High-Fidelity Content-Constrained Style-Driven Generation via Staged Semantic and Frequency Disentanglement

Related Papers