Search papers, labs, and topics across Lattice.
This paper introduces A$^2$BFR, a novel attribute-aware blind face restoration framework that combines high-fidelity reconstruction with prompt-controllable generation using a Diffusion Transformer backbone. The method employs attribute-aware learning to supervise denoising latents with facial attribute embeddings and semantic dual-training using a new AttrFace-90K dataset to improve prompt controllability. Experiments demonstrate state-of-the-art performance in restoration fidelity and instruction adherence, surpassing diffusion-based BFR baselines significantly.
Finally, a blind face restoration method that doesn't just hallucinate details, but lets you precisely control facial attributes via text prompts while maintaining high fidelity.
Blind face restoration (BFR) aims to recover high-quality facial images from degraded inputs, yet its inherently ill-posed nature leads to ambiguous and uncontrollable solutions. Recent diffusion-based BFR methods improve perceptual quality but remain uncontrollable, whereas text-guided face editing enables attribute manipulation without reliable restoration. To address these issues, we propose A$^2$BFR, an attribute-aware blind face restoration framework that unifies high-fidelity reconstruction with prompt-controllable generation. Built upon a Diffusion Transformer backbone with unified image-text cross-modal attention, A$^2$BFR jointly conditions the denoising trajectory on both degraded inputs and textual prompts. To inject semantic priors, we introduce attribute-aware learning, which supervises denoising latents using facial attribute embeddings extracted by an attribute-aware encoder. To further enhance prompt controllability, we introduce semantic dual-training, which leverages the pairwise attribute variations in our newly curated AttrFace-90K dataset to enforce attribute discrimination while preserving fidelity. Extensive experiments demonstrate that A$^2$BFR achieves state-of-the-art performance in both restoration fidelity and instruction adherence, outperforming diffusion-based BFR baselines by -0.0467 LPIPS and +52.58% attribute accuracy, while enabling fine-grained, prompt-controllable restoration even under severe degradations.