Gargi Memorial Institute of TechnologyIIT MadrasApr 9, 2026arXiv:2604.08037

PrivFedTalk: Privacy-Aware Federated Diffusion with Identity-Stable Adapters for Personalized Talking-Head Generation

Soumya Mazumdar, Vineet Kumar Rakesh, Tapas Samanta

AI Summary

PrivFedTalk is introduced, a federated learning framework for personalized talking-head generation that addresses privacy concerns by training a shared diffusion backbone and client-specific LoRA identity adapters on local data. To handle heterogeneous client data, they propose Identity-Stable Federated Aggregation (ISFA), which weights client updates based on on-device identity consistency and temporal stability. They also introduce Temporal-Denoising Consistency (TDC) regularization to improve temporal stability and reduce identity drift.

Key Contribution

Personalized talking-head generation can now be trained in a privacy-preserving federated setting, achieving stable optimization and successful end-to-end training under constrained resources.

Abstract

Talking-head generation has advanced rapidly with diffusion-based generative models, but training usually depends on centralized face-video and speech datasets, raising major privacy concerns. The problem is more acute for personalized talking-head generation, where identity-specific data are highly sensitive and often cannot be pooled across users or devices. PrivFedTalk is presented as a privacy-aware federated framework for personalized talking-head generation that combines conditional latent diffusion with parameter-efficient identity adaptation. A shared diffusion backbone is trained across clients, while each client learns lightweight LoRA identity adapters from local private audio-visual data, avoiding raw data sharing and reducing communication cost. To address heterogeneous client distributions, Identity-Stable Federated Aggregation (ISFA) weights client updates using privacy-safe scalar reliability signals computed from on-device identity consistency and temporal stability estimates. Temporal-Denoising Consistency (TDC) regularization is introduced to reduce inter-frame drift, flicker, and identity drift during federated denoising. To limit update-side privacy risk, secure aggregation and client-level differential privacy are applied to adapter updates. The implementation supports both low-memory GPU execution and multi-GPU client-parallel training on heterogeneous shared hardware. Comparative experiments on the present setup across multiple training and aggregation conditions with PrivFedTalk, FedAvg, and FedProx show stable federated optimization and successful end-to-end training and evaluation under constrained resources. The results support the feasibility of privacy-aware personalized talking-head training in federated environments, while suggesting that stronger component-wise, privacy-utility, and qualitative claims need further standardized evaluation.

Computer Vision Distributed Systems & Hardware Speech & Audio

Citation Metrics

Citations0

Influential citations0

References39

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PrivFedTalk: Privacy-Aware Federated Diffusion with Identity-Stable Adapters for Personalized Talking-Head Generation

Related Papers