Mar 16, 2026arXiv:2603.15512

FreeTalk: Emotional Topology-Free 3D Talking Heads

Federico Nocentini, Thomas Besnier, Claudio Ferrari, Stefano Berretti, Mohamed Daoudi

AI Summary

FreeTalk is introduced, a novel two-stage framework for emotion-conditioned 3D talking-head animation that operates on unregistered face meshes with arbitrary topology. The framework first predicts temporally coherent 3D landmark displacements from speech audio, conditioned on emotion, using Audio-To-Sparse (ATS). Then, Sparse-To-Mesh (STM) transfers the landmark motion to a target mesh using intrinsic surface features and landmark-to-vertex conditioning, generating dense per-vertex deformations without template fitting.

Key Contribution

Unleashing realistic 3D talking heads on *any* face scan, FreeTalk breaks free from template meshes and rigid topologies, even capturing nuanced emotional expressions.

Abstract

Speech-driven 3D facial animation has advanced rapidly, yet most approaches remain tied to registered template meshes, preventing effective deployment on raw 3D scans with arbitrary topology. At the same time, modeling controllable emotional dynamics beyond lip articulation remains challenging, and is often tied to template-based parameterizations. We address these challenges by proposing FreeTalk, a two-stage framework for emotion-conditioned 3D talking-head animation that generalizes to unregistered face meshes with arbitrary vertex count and connectivity. First, Audio-To-Sparse (ATS) predicts a temporally coherent sequence of 3D landmark displacements from speech audio, conditioned on an emotion category and intensity. This sparse representation captures both articulatory and affective motion while remaining independent of mesh topology. Second, Sparse-To-Mesh (STM) transfers the predicted landmark motion to a target mesh by combining intrinsic surface features with landmark-to-vertex conditioning, producing dense per-vertex deformations without template fitting or correspondence supervision at test time. Extensive experiments show that FreeTalk matches specialized baselines when trained in-domain, while providing substantially improved robustness to unseen identities and mesh topologies. Code and pre-trained models will be made publicly available.

Computer Vision Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FreeTalk: Emotional Topology-Free 3D Talking Heads

Related Papers