Search papers, labs, and topics across Lattice.
4
0
10
4
Low Word Error Rate can be a mirage: compressing speech to "pure" semantic tokens, even with near-perfect WER, produces unintelligible speech when used for generation.
Forget specialized architectures: StepAudio 2.5 proves a single audio-language foundation, shaped by RLHF, can dominate ASR, TTS, and real-time dialogue simultaneously.
Over 96% of real-world MCP servers using OAuth for authentication suffer from dynamic client registration flaws, potentially leading to sensitive information leakage and account takeover.
RLVR, the dominant training paradigm for audio language models, may be turning them into unfeeling "answering machines" that excel on benchmarks but fail the vibe check.