Search papers, labs, and topics across Lattice.
2
1
5
2
DINOv2 visual features and Wav2Vec 2.0 audio features can be effectively fused in a two-stage model to achieve state-of-the-art facial expression recognition in challenging, unconstrained video conditions.
Finally, a fully open-source, reproducible system for long-form song generation is here, complete with licensed data, code, and a Qwen-based model that rivals closed-source systems.