Search papers, labs, and topics across Lattice.
Peking University
2
0
4
Multi-event video generation gets a 33% quality boost with TS-Attn, a training-free attention mechanism that dynamically aligns video content with complex temporal prompts.
A reward model trained on spatial relationship preferences beats proprietary models at spatial understanding in text-to-image generation, and unlocks better RL-based image generation.