Search papers, labs, and topics across Lattice.
Department of Artificial Intelligence, Korea University
2
0
5
Achieving fine-grained semantic alignment in text-to-video generation is now possible with a model that explicitly verifies every prompt condition against visual evidence.
Decomposing text prompts into semantic units and using VQA for fine-grained self-reflection dramatically improves image generation quality, especially for complex compositions.