Yuning Gong

PhotoFlow, Sichuan University https, visionary-laboratory

Papers on Lattice

Total citations

Topics

Research focus

Computer Vision (5)Multimodal Models (5)Reasoning & Chain-of-Thought (2)Tool Use & Agents (1)World Models & Planning (1)

Frequent co-authors

Zhihang Zhong (2)Yangfu Li (2)Hongjian Zhan (2)Yue Lu (2)

Papers (5)

May 22, 2026

May 22, 2026·also Cornell, Northeastern, PhotoFlow, SCU +3

PhotoFlow: Agentic 3D Virtual Photography Missions

LLM-powered agents can now produce surprisingly strong photographs in complex 3D environments, suggesting a path towards embodied AI with aesthetic awareness.

Jiarui Guo, Haojia Wei, Yifei Liu +3

Computer Vision Multimodal Models Tool Use & Agents

May 4, 2026

May 4, 2026·also HKUST, PhotoFlow, SCU, visionary-laboratory

Perceptual Flow Network for Visually Grounded Reasoning

LVLMs can achieve SOTA visual reasoning by learning to "see" in a way that optimizes for reasoning, even if it means deviating from strict geometric accuracy.

Yangfu Li, Yuning Gong, Hongjian Zhan +4

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Apr 15, 2026

Tsinghua AIApr 15, 2026·also Corresponding author are Bo Cheng and Soujanya, PhotoFlow, SCU, visionary-laboratory

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Imagine creating high-fidelity, navigable 3D worlds from just a text prompt or a single image – HY-World 2.0 makes it a reality.

Team HY-World, Chenjie Cao, Xuhui Zuo +39

Computer Vision Multimodal Models World Models & Planning

Mar 8, 2026

PhotoFlowMar 8, 2026·also B-Instruct VLM + DiT-L MMDiT action, Northwestern, SCU, visionary-laboratory

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence

Forget hand-annotated 3D datasets: a new automated pipeline generates massive, high-quality 3D spatial intelligence from raw video, unlocking better VLM reasoning.

Yuning Gong, Yuanjun Liao, Fangfu Liu +1

Computer Vision Data Curation & Synthetic Data Multimodal Models

Mar 4, 2026

Mar 4, 2026·also PhotoFlow, SCU, visionary-laboratory

DeepScan: A Training-Free Framework for Visually Grounded Reasoning in Large Vision-Language Models

Forget training wheels: DeepScan unlocks significant gains in LVLM visual reasoning *without* any additional training, achieving state-of-the-art results on V*.

Yangfu Li, Hongjian Zhan, Yuning Gong +1

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Search

Yuning Gong

Research focus

Frequent co-authors

Papers (5)