Search papers, labs, and topics across Lattice.
5
0
9
34
VLMs can learn to actively reason and plan in 3D environments by distilling view graphs from self-exploration trajectories, enabling them to surpass even larger models like GPT-4 Pro and Gemini 1.5 Pro on interactive view planning.
Hierarchical planning and self-reflection can finally wrangle AIGC tools into producing coherent, visually consistent webpages.
Forget text-centric pipelines: FlowInOne achieves SOTA multimodal generation by unifying text, layouts, and instructions into a single visual flow, outperforming both open-source and commercial systems.
LLM agents can appear to reason well (high entropy) while completely ignoring the input, and mutual information is a far better metric for catching this failure.
Current image generation models fall far short of the mark when it comes to the structured and multi-constraint demands of real-world commercial design, as revealed by a new systematic benchmark.