Search papers, labs, and topics across Lattice.
2
0
6
Keyframe-residual captioning unlocks high-fidelity video-language supervision, surpassing direct VLM captioning in capturing fine-grained visual details.
A 1.7B parameter model can now rival much larger audio language models, thanks to a novel architecture and data synthesis pipeline.