Search papers, labs, and topics across Lattice.
5
0
8
5
Achieve 3x faster video captioning without sacrificing accuracy by swapping quadratic attention for a linear Mamba backbone and hierarchical bidirectional scanning.
Training VLMs on Jagle, the largest Japanese multimodal dataset, not only crushes existing models on Japanese tasks, but *also* boosts English performance when combined with English data.
Japanese VQA benchmarks are riddled with issues that lead to misleading model comparisons, but JAMMEval fixes this with a rigorous, two-stage refinement process.
Optimizing multilingual training? Shapley values reveal the hidden cross-lingual transfer effects that current scaling laws miss, leading to better language mixture ratios.