Search papers, labs, and topics across Lattice.
2
0
5
Low Word Error Rate can be a mirage: compressing speech to "pure" semantic tokens, even with near-perfect WER, produces unintelligible speech when used for generation.
Forget specialized architectures: StepAudio 2.5 proves a single audio-language foundation, shaped by RLHF, can dominate ASR, TTS, and real-time dialogue simultaneously.