Search papers, labs, and topics across Lattice.
3
0
6
3
OmniJigsaw reveals a "bi-modal shortcut phenomenon" in joint audio-visual integration, demonstrating that naive fusion can be surprisingly ineffective and highlighting the importance of carefully designed cross-modal training strategies.
Forget expensive audio-text data collection: TASU2 lets you dial in the perfect amount of noise for training your speech LLM, all from text.
Ditch fixed chunk sizes: TC-BiMamba unlocks faster, more memory-efficient training for bidirectional Mamba ASR models, enabling unified streaming and non-streaming decoding.