Search papers, labs, and topics across Lattice.
4
0
5
Geometry, not token discreteness, is the key to unlocking superior performance in speech-to-LLM integration.
Encoder-free speech modeling can rival traditional methods, challenging the necessity of dedicated speech encoders in LLM architectures.
Unleashing LLMs' reasoning powers on speech unlocks a new ASR paradigm, slashing error rates by up to 17% simply by having the model "think" before transcribing.
Agent systems leveraging iterative tool orchestration and cross-modal analysis significantly outperform single models in audio reasoning, highlighting a promising path toward explainable audio intelligence.