Search papers, labs, and topics across Lattice.
Salesforce AI Research
2
0
5
Whisper's speech-centric training leaves audio-LLMs tone-deaf to music and environmental sounds, but a simple fine-tune can fix that.
Forget slow, end-to-end models: building real-time voice agents hinges on a cascaded streaming pipeline, as demonstrated by a new tutorial achieving sub-second latency.