Search papers, labs, and topics across Lattice.
3
0
6
Today's visual generation models excel at photorealism but still fail at the kind of spatial reasoning, long-term consistency, and causal understanding that truly intelligent visual generation demands.
Adversarial training doesn't have to hurt speaker verification: by explicitly modeling language, you can disentangle speaker and language characteristics without sacrificing speaker discriminability.
Instruction-guided video editing can achieve impressive zero-shot performance simply by pre-training on motion-centric video restoration tasks *before* fine-tuning on paired editing data.