Search papers, labs, and topics across Lattice.
2
0
4
3
Forget brute-force context windows: a small vision-language model can compress hour-long videos below theoretical limits by intelligently prioritizing relevant content.
Vision models are far more data-hungry than language models, but Mixture-of-Experts can harmonize this asymmetry for truly unified multimodal models.