Search papers, labs, and topics across Lattice.
2
0
6
0
Explicitly aligning audio and video streams in a multimodal Transformer boosts emotion recognition, showing that ignoring frame-rate differences hurts performance.
Forget full-network finetuning: adapting only a low-dimensional decoder subspace unlocks state-of-the-art zero-shot depth completion with significantly improved efficiency.