Search papers, labs, and topics across Lattice.
2
0
5
8
Forget text-first: SODA models show that scaling native audio foundation models with interleaved semantic, acoustic, and text tokens unlocks powerful audio generation and cross-modal capabilities.
Speech recognition models stumble badly on real-world street names, especially for non-English speakers, but a simple synthetic data boost can dramatically improve accuracy.