Search papers, labs, and topics across Lattice.
1
0
3
Explicitly aligning audio and video streams in a multimodal Transformer boosts emotion recognition, showing that ignoring frame-rate differences hurts performance.