Search papers, labs, and topics across Lattice.
AfriVox-v2 is introduced as a new benchmark for African speech recognition, featuring unscripted, "in the wild" audio and domain-specific evaluations across ten sectors. The benchmark assesses the performance of recent speech models like Sahara-v2, Gemini 3 Flash, and Omnilingual CTC in realistic, noisy African settings. Results reveal a significant generalization gap in modern speech models when applied to specialized African contexts.
Modern speech models struggle to generalize to noisy, domain-specific African speech, highlighting a critical gap for localized voice AI.
Recent large language models (LLMs) show strong speech recognition and translation capabilities for high-resource languages. However, African languages remain dramatically underrepresented in benchmarks, limiting their practical use in low-resource settings. While early benchmarks tested African languages and accents, they lacked exhaustive real-world noise and granular domain evaluations. We present AfriVox-v2, a comprehensive benchmark designed to test speech models under realistic African deployment conditions. AfriVox-v2 introduces"in the wild"unscripted audio for all supported languages. We also introduce strict domain verticalization, evaluating model accuracy across ten sectors including government, finance, health, and agriculture and conducting targeted tests on numbers and named entities. Finally, we benchmark a new generation of speech models, including Sahara-v2, Gemini 3 Flash, and the Omnilingual CTC models. Our results expose the true generalization gap of modern speech models in specialized, noisy African contexts and provide a reliable blueprint for developers building localized voice AI.