Search papers, labs, and topics across Lattice.
This survey reviews the evolution of accent conversion techniques, from early DSP methods to modern neural architectures, highlighting challenges in data alignment, representation disentanglement, and resource scarcity. It analyzes how linguistic considerations and application requirements influence the trade-off between accent modification and speaker identity preservation. The survey identifies persistent challenges and suggests future research directions for controllable and perceptually consistent accent conversion.
Successfully converting accents requires balancing accent modification with speaker identity preservation, a challenge that this survey unpacks by tracing the evolution of techniques from DSP to neural methods.
Accent conversion has rapidly progressed alongside growing interest in improving global cross-cultural communication. This survey presents an overview of the evolution of accent conversion methodologies, analyzing how the field has developed in response to fundamental challenges related to data alignment, representation disentanglement, and resource scarcity. We trace the progression from early rule-based digital signal processing approaches such as spectral manipulation and formant-based analysis to modern neural architectures capable of flexible and reference-free accent transformation. In addition, the survey situates accent conversion within its linguistic foundations and examines how different application requirements impose varying constraints on the balance between accent modification and speaker identity preservation. Finally, it reviews commonly used speech datasets and evaluation methodologies, identifies persistent challenges, and outlines directions for future research aimed at achieving more controllable and perceptually consistent accent conversion.