Search papers, labs, and topics across Lattice.
2
0
6
8
Low Word Error Rate can be a mirage: compressing speech to "pure" semantic tokens, even with near-perfect WER, produces unintelligible speech when used for generation.
RLVR, the dominant training paradigm for audio language models, may be turning them into unfeeling "answering machines" that excel on benchmarks but fail the vibe check.