Search papers, labs, and topics across Lattice.
The paper introduces spINAch, a 320-hour diachronic corpus of French broadcast speech spanning 1955-2015, carefully balanced for speaker age and gender. The corpus was automatically transcribed and phonetically aligned, enabling analysis at the phoneme level. Analysis of over 3 million oral vowels reveals insights into the evolution of Parisian French, including voice pitch changes over time and the neutralization of the /a/-/$a$/ opposition.
A new 320-hour diachronic French speech corpus, spINAch, reveals surprising trends in Parisian French pronunciation over 60 years, including the absence of gender-specific voice pitch evolution.
We present spINAch, a large diachronic corpus of French speech from radio and television archives, balanced by speakers' gender, age (20-95 years old), and spanning 60 years from 1955 to 2015. The dataset includes over 320 hours of recordings from more than two thousand speakers. The methodology for building the corpus is described, focusing on the quality of collected samples in acoustic terms. The data were automatically transcribed and phonetically aligned to allow studies at a phonemic level. More than 3 million oral vowels have been analyzed to propose their fundamental frequency and formants. The corpus, available to the community for research purposes, is valuable for describing the evolution of Parisian French through the representation of gender and age. The presented analyses also demonstrate that the diachronic nature of the corpus allows the observation of various phonetic phenomena, such as the evolution of voice pitch over time (which does not differ by gender in our data) and the neutralization of the /a/-/$a$/ opposition in Parisian French during this period.