Search papers, labs, and topics across Lattice.
PhotoFlow, Sichuan University https, visionary-laboratory
5
0
6
LLM-powered agents can now produce surprisingly strong photographs in complex 3D environments, suggesting a path towards embodied AI with aesthetic awareness.
LVLMs can achieve SOTA visual reasoning by learning to "see" in a way that optimizes for reasoning, even if it means deviating from strict geometric accuracy.
Imagine creating high-fidelity, navigable 3D worlds from just a text prompt or a single image – HY-World 2.0 makes it a reality.
Forget hand-annotated 3D datasets: a new automated pipeline generates massive, high-quality 3D spatial intelligence from raw video, unlocking better VLM reasoning.
Forget training wheels: DeepScan unlocks significant gains in LVLM visual reasoning *without* any additional training, achieving state-of-the-art results on V*.