Search papers, labs, and topics across Lattice.
This paper introduces Ouvia, a user-centered framework designed to evaluate the usability of speech translation (ST) in real-world communication scenarios, specifically focusing on one-to-one interactions between English and Portuguese speakers. Through a comprehensive study involving over 1,750 interactions across various demographics and ST systems, the authors reveal that only about 50% of these interactions are deemed usable, highlighting significant usability gaps among different user groups. Notably, they find that quality assurance-based evaluations are much more effective predictors of real-world usability than traditional quality metrics, underscoring the need for more context-sensitive evaluation methods.
Only half of speech translation interactions are rated as usable, revealing critical usability gaps that standard evaluations overlook.
Speech translation (ST) is increasingly adopted in user applications, yet its evaluation largely focuses on decontextualized testbeds and holistic quality, rather than end users' communication needs. We introduce Ouvia, an evaluation framework for measuring user-perceived usability of speech translation outputs in real-world settings. Ouvia focuses on one-to-one communication: an English speaker needs to convey a request to a Portuguese speaker, and the message is automatically translated. Through a custom web app and multi-phase study design, we collect more than 1,750 such interactions in healthcare and everyday situations, mediated by four ST systems, involving speakers from three English dialects and two genders. We find that modern ST serves people only to a limited extent -- only around half of interactions are rated as usable -- with significant gaps in reported usability across demographic groups. Moreover, among quality metrics, we find that QA-based evaluation is a substantially stronger predictor of real-world usability than standard approaches. Together, these findings stress the importance of situated, user-centered evaluation frameworks that go beyond holistic quality scores and attend to who the technology serves -- and how well.