Search papers, labs, and topics across Lattice.
The paper introduces RECOVER, an agentic framework that uses an LLM as a tool-using agent to correct entity recognition errors in ASR outputs, particularly for rare and domain-specific terms. RECOVER leverages multiple ASR hypotheses generated using different strategies (1-Best, Entity-Aware Select, ROVER Ensemble, and LLM-Select) to retrieve relevant entities and apply constrained LLM correction. Experiments on five datasets demonstrate that RECOVER achieves 8-46% relative reductions in entity-phrase word error rate (E-WER) and increases recall by up to 22 percentage points, with LLM-Select showing the best overall performance.
An agentic framework slashes entity recognition errors in ASR by up to 46% by intelligently combining multiple ASR hypotheses and constrained LLM correction.
Entity recognition in Automatic Speech Recognition (ASR) is challenging for rare and domain-specific terms. In domains such as finance, medicine, and air traffic control, these errors are costly. If the entities are entirely absent from the ASR output, post-ASR correction becomes difficult. To address this, we introduce RECOVER, an agentic correction framework that serves as a tool-using agent. It leverages multiple hypotheses as evidence from ASR, retrieves relevant entities, and applies Large Language Model (LLM) correction under constraints. The hypotheses are used using different strategies, namely, 1-Best, Entity-Aware Select, Recognizer Output Voting Error Reduction (ROVER) Ensemble, and LLM-Select. Evaluated across five diverse datasets, it achieves 8-46% relative reductions in entity-phrase word error rate (E-WER) and increases recall by up to 22 percentage points. The LLM-Select achieves the best overall performance in entity correction while maintaining overall WER.