Mistral AI
French AI lab building open and portable generative AI models. Known for Mistral and Mixtral model families.
mistral.ai1
0
0
Top Researchers
Recent Papers
The paper introduces Voxtral Realtime, a novel automatic speech recognition (ASR) model designed for native streaming with sub-second latency. Unlike chunking-based approaches, Voxtral Realtime is trained end-to-end for streaming with explicit audio-text alignment, leveraging the Delayed Streams Modeling framework. The model incorporates a new causal audio encoder and Ada RMS-Norm for improved delay conditioning, and achieves performance comparable to Whisper at a 480ms delay after large-scale pretraining across 13 languages.
Presents Voxtral Realtime, a natively streaming ASR model that matches offline transcription quality at sub-second latency through end-to-end training and explicit audio-text stream alignment.

