Search papers, labs, and topics across Lattice.
2
0
3
9
Self-supervised video models can now learn dense features rivaling supervised methods, unlocking a 20-point jump in robot grasping success.
Vision models are far more data-hungry than language models, but Mixture-of-Experts can harmonize this asymmetry for truly unified multimodal models.