Zhihang Zhong

Shanghai Jiao Tong University

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (6)Computer Vision (5)World Models & Planning (2)Tool Use & Agents (2)

Frequent co-authors

Xue Yang (3)Yifei Liu (2)Yuning Gong (2)Hongjie Zhang (2)

Papers (7)

Jun 9, 2026

1d ago·also AI Laboratory, SJTU

Segment and Select: Vision-Language Segmentation in 3D Scenarios

SEGA3D achieves an impressive 8.3 mIoU improvement over previous methods, redefining the standards for 3D vision-language segmentation.

Yulin Chen, Zhihang Zhong, Yuenan Hou

Computer Vision Multimodal Models

1d ago·also SJTU

CoCoSI: Collaborative Cognitive Map Construction for Spatial Intelligence

Spatial intelligence in MLLMs can be dramatically enhanced without any architectural modifications or retraining, thanks to a novel collaborative cognitive mapping approach.

Yiming Zhang, Ruoxuan Cao, Zhihang Zhong

Multimodal Models World Models & Planning

Jun 1, 2026

1w ago·also AI Laboratory, Northwestern, SEU, SJTU +4

Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events

Current video MLLMs struggle to grasp fleeting visual events, with top models barely surpassing 39% accuracy on critical momentary tasks.

Xiaolin Liu, Yilun Zhu, Xiangyu Zhao +9

Computer Vision Multimodal Models

May 22, 2026

2w ago·also AI Laboratory, Cornell, Northeastern, PhotoFlow +3

PhotoFlow: Agentic 3D Virtual Photography Missions

LLM-powered agents can now produce surprisingly strong photographs in complex 3D environments, suggesting a path towards embodied AI with aesthetic awareness.

Jiarui Guo, Haojia Wei, Yifei Liu +4

Computer Vision Multimodal Models Tool Use & Agents

May 21, 2026

D visual recognition and2w ago·also AI Laboratory, Beihang, Chongqing, D scene information. First +4

SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

Visual degradations can cripple the spatial reasoning abilities of even state-of-the-art MLLMs, but targeted finetuning can restore—and even surpass—human-level performance.

Xiaolong Zhou, Yifei Liu, Ziyang Gong +6

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

May 4, 2026

May 4, 2026·also HKUST, PhotoFlow, SCU, SJTU +2

Perceptual Flow Network for Visually Grounded Reasoning

LVLMs can achieve SOTA visual reasoning by learning to "see" in a way that optimizes for reasoning, even if it means deviating from strict geometric accuracy.

Yangfu Li, Yuning Gong, Hongjian Zhan +7

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Apr 7, 2026

Liyuan Deng +8Apr 7, 2026·also SJTU

COSMO-Agent: Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

Small, open-source LLMs can now outperform larger, closed-source models in complex industrial design tasks by learning to orchestrate CAD/CAE tools within a reinforcement learning framework.

Liyuan Deng, Shujian Deng, Yongkang Chen +6

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Search

Zhihang Zhong

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (7)