Mengshi Qi

Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (6)Reasoning & Chain-of-Thought (4)Computer Vision (2)Code Generation & Program Synthesis (1)

Frequent co-authors

Huadong Ma (4)Wei Deng (3)Xiaoyang Bi (2)Shuaikun Liu (2)

Papers (7)

Jun 9, 2026

1d ago·also Beijing Key Laboratory of Intelligent

Leveraging Metric Depth for Relative Depth Prediction

Achieving a score of $2.68 \times 10^{-3}$ in a depth estimation challenge reveals the untapped potential of zero-shot learning in complex visual tasks.

Xiaoyang Bi, Shuaikun Liu, Zhaohong Liu +4

Computer Vision Multimodal Models

Jun 8, 2026

2d ago·also Beijing Key Laboratory of Intelligent

Claude Code-Driving Scenario Mining for the Argoverse 2 Challenge

Autonomous code generation combined with rigorous semantic review can drastically enhance scenario mining accuracy in complex driving environments.

Wei Deng, Caoshengzhe Xue, Shuaikun Liu +3

Code Generation & Program Synthesis Data Curation & Synthetic Data Robotics & Embodied AI

State Key Laboratory of Networking and Switching2d ago·also Beijing Key Laboratory of Intelligent, BUPT

A VideoMAE-v2 Approach to Zero-Shot Traffic Accident Anticipation

Zero-shot learning can now predict traffic accidents in real-time without the need for costly annotated datasets, achieving competitive results in a major competition.

Siyuan Li, Xiaoyang Bi, Mengshi Qi

Computer Vision Multimodal Models

Jun 4, 2026

Beijing Key Laboratory of Intelligent6d ago·also BUPT

Global-Local Monte Carlo Tree Search in Vision-Language Models for Text-to-3D Indoor Scene Generation

By rethinking text-to-3D generation as a planning problem, this approach significantly reduces error propagation and enhances scene realism.

Mengshi Qi, Wei Deng, Xianlin Zhang +1

Multimodal Models Reasoning & Chain-of-Thought World Models & Planning

Jun 1, 2026

1w ago·also Beijing Key Laboratory of Intelligent

Reason-Then-Retrieve for CoVR-R with Structured Edit Prompts and Dense-Sparse Fusion

Achieving nearly 90% accuracy in retrieving videos based on nuanced edit instructions could redefine standards in video retrieval systems.

DongQing Liu, Mengshi Qi, HongWei Ji

Multimodal Models Reasoning & Chain-of-Thought Recommendation & Information Retrieval

1w ago·also Beijing Key Laboratory of Intelligent

Question-Aware Evidence Ledgers for Video Relational Reasoning

Achieving nearly 93% accuracy in video relational reasoning, this approach reveals how structured evidence can dramatically enhance model performance in complex visual contexts.

Yilin Ou, Mengshi Qi, Huadong Ma

Multimodal Models Reasoning & Chain-of-Thought

1w ago·also Beijing Key Laboratory of Intelligent

Active Exploring like a Pigeon: Reinforcing Spatial Reasoning via Agentic Vision-Language Models

Transforming VLMs into active agents with cognitive maps leads to a staggering 53.2% boost in spatial reasoning accuracy.

Wei Deng, Xianlin Zhang, Mengshi Qi

Multimodal Models Reasoning & Chain-of-Thought Tool Use & Agents