Search papers, labs, and topics across Lattice.
2
0
5
6
Encoding temporal prediction into video VAEs unlocks faster training, better generative performance, and improved downstream task performance, all at once.
Jointly training the tokenizer and autoregressive model slashes ImageNet FID to 1.48, finally making end-to-end autoregressive image generation competitive.