Search papers, labs, and topics across Lattice.
University of Maryland
4
0
7
Generate semantically aligned, high-fidelity music for videos with unprecedented speed and control by combining autoregressive planning and diffusion.
Audio-language models can now reason about 30-minute-long audio clips with timestamp-grounded intermediate steps, unlocking a new level of fine-grained understanding.
Current multimodal models are surprisingly bad at understanding long, complex videos, struggling to integrate audio, visual, and text cues even for basic reasoning tasks.
Forget HRTFs: a differentiable multi-sphere scattering model inspired by underwater animal acoustics offers a new foundation for spatial audio processing and localization.