Search papers, labs, and topics across Lattice.
2
0
4
3
Forget training on long videos – PackForcing achieves state-of-the-art long-video generation by cleverly compressing the KV-cache into Sink, Mid, and Recent tokens, enabling 24x temporal extrapolation from short-video training.
Vision models are far more data-hungry than language models, but Mixture-of-Experts can harmonize this asymmetry for truly unified multimodal models.