Friedrich-Alexander-UniversitätUniversidad de Antioquia UdeAMay 25, 2026arXiv:2605.25596

Multilingual Phonological Feature Recognition with Self-Supervised Speech Models

Abner Hernandez, Tomás Arias-Vergara, Daiqi Liu, Andreas Maier, Paula Andrea Pérez-Toro

AI Summary

The authors introduce PhonoQ-2.0, a multilingual phonological feature recognizer built upon self-supervised speech models that directly predicts a 22-dimensional feature vector encoding manner, vowel quality, place, and voicing. To enforce phonological coherence, they incorporate a manner-conditioned gating mechanism that activates valid feature groups. Evaluated across multiple languages, PhonoQ-2.0 demonstrates significant improvements over a CTC phoneme baseline, achieving an average macro-F1 of 91.3% in-domain and 88.9% out-of-domain, and improving unseen-language macro-F1 by 6.7 points.

Key Contribution

Ditch phoneme-based speech processing: directly predicting phonological features from speech unlocks substantial gains in multilingual and cross-lingual speech recognition.

Abstract

Phonological features provide a language-general and linguistically grounded representation of speech. We present PhonoQ-2.0, a multilingual frame-level phonological feature recognizer built on self-supervised speech models. The system directly predicts a structured 22-dimensional feature vector per frame encoding manner, vowel quality, place, and voicing, instead of deriving features from phoneme outputs. To ensure phonologically coherent predictions, we introduce a manner-conditioned gating mechanism that activates valid feature groups. Evaluated across multiple languages and corpora, PhonoQ-2.0 achieves an average macro-F1 of 91.3% in-domain and 88.9% out-of-domain. Compared to a strong CTC phoneme baseline, it delivers consistent gains of +8.8 F1 in-domain and +8.6 out-of-domain on average. In unseen-language evaluation, PhonoQ-2.0 improves macro-F1 from 66.9% to 73.6% (+6.7 on average), with gains of up to +10.8 points.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multilingual Phonological Feature Recognition with Self-Supervised Speech Models

Related Papers