Search papers, labs, and topics across Lattice.
3
0
5
4
Achieve 3x faster video captioning without sacrificing accuracy by swapping quadratic attention for a linear Mamba backbone and hierarchical bidirectional scanning.
Training VLMs on Jagle, the largest Japanese multimodal dataset, not only crushes existing models on Japanese tasks, but *also* boosts English performance when combined with English data.