CNRSFeb 23, 2026arXiv:2602.19991

Cross-lingual Matryoshka Representation Learning across Speech and Text

Yaya Sy, Yaya Sy, Dioula Doucouré, Dioula Doucour'e, Christophe Cerisara, Christophe Cerisara, Irina Illina, I. Illina

AI Summary

The paper introduces a bilingual speech-text Matryoshka embedding model for French and Wolof to enable cross-lingual retrieval of French text using Wolof speech queries, bypassing ASR-translation pipelines. They curate large-scale datasets and benchmarks, finding that modality fusion within a frozen text Matryoshka model achieves the best retrieval performance. The model demonstrates generalization to other tasks like speech intent detection, and analysis reveals that key information is concentrated in a few Matryoshka components, suggesting efficiency gains are possible.

Key Contribution

Skip the ASR-translation pipeline: a new bilingual speech-text embedding model lets you retrieve French text directly from Wolof speech.

Abstract

Speakers of under-represented languages face both a language barrier, as most online knowledge is in a few dominant languages, and a modality barrier, since information is largely text-based while many languages are primarily oral. We address this for French-Wolof by training the first bilingual speech-text Matryoshka embedding model, enabling efficient retrieval of French text from Wolof speech queries without relying on a costly ASR-translation pipelines. We introduce large-scale data curation pipelines and new benchmarks, compare modeling strategies, and show that modality fusion within a frozen text Matryoshka model performs best. Although trained only for retrieval, the model generalizes well to other tasks, such as speech intent detection, indicating the learning of general semantic representations. Finally, we analyze cost-accuracy trade-offs across Matryoshka dimensions and ranks, showing that information is concentrated only in a few components, suggesting potential for efficiency improvements.

Multimodal Models Recommendation & Information Retrieval Speech & Audio

Citation Metrics

Citations0

Influential citations0

References31

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Cross-lingual Matryoshka Representation Learning across Speech and Text

Related Papers