NagoyaApr 27, 2026arXiv:2604.24278

RAS: a Reliability Oriented Metric for Automatic Speech Recognition

Wen-Chin Huang, Yuhang Qiu, Bohan Li, Yiwei Guo, Jing Peng, Hankun Wang, Xie Chen, Kai Yu

AI Summary

This paper introduces an abstention-aware ASR framework that allows models to abstain from transcribing uncertain segments, improving reliability. They propose a new metric, RAS, to evaluate ASR reliability by balancing informativeness and error aversion, calibrated with human preferences. Experiments show that training with supervised bootstrapping and reinforcement learning significantly improves transcription reliability without sacrificing accuracy.

Key Contribution

ASR systems can now be more trustworthy: this work shows how to train them to abstain from transcribing uncertain segments, leading to more reliable outputs.

Abstract

Automatic speech recognition systems often produce confident yet incorrect transcriptions under noisy or ambiguous conditions, which can be misleading for both users and downstream applications. Standard evaluation based on Word Error Rate focuses solely on accuracy and fails to capture transcription reliability. We introduce an abstention-aware transcription framework that enables ASR models to explicitly abstain from uncertain segments. To evaluate reliability under abstention, we propose RAS, a reliability-oriented metric that balances transcription informativeness and error aversion, with its trade-off parameter calibrated by human preference. We then train an abstention-aware ASR model through supervised bootstrapping followed by reinforcement learning. Our experiments demonstrate substantial improvements in transcription reliability while maintaining competitive accuracy.

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References32

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RAS: a Reliability Oriented Metric for Automatic Speech Recognition

Related Papers