Xiang Bai

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Computer Vision (6)Multimodal Models (5)Reasoning & Chain-of-Thought (2)World Models & Planning (2)

Frequent co-authors

Dingkang Liang (3)Zhenbo Luo (2)Pei Fu (2)Jian Luan (2)

Papers (8)

Jul 9, 2026

Pengjie Wang +104d ago

DeltaV: Thinking with Visual State Updates in Unified Large Multimodal Models

Visual updates in DeltaV cut token generation by over half while boosting reasoning accuracy, challenging the need for full-image outputs in multimodal models.

Pengjie Wang, Linger Deng, Zujian Zhang +8

Multimodal Models Reasoning & Chain-of-Thought

Jiangwei Ren +74d ago

Wat3R: Underwater 3D Geometry Learning without Annotations

Wat3R achieves superior underwater 3D geometry estimation without any annotated data, leveraging unlabeled footage to overcome the challenges of light attenuation and scattering.

Jiangwei Ren, Xingyu Jiang, Zijie Song +5

Computer Vision

Apr 30, 2026

Xin Zhou +6Apr 30, 2026

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

HERMES++ achieves state-of-the-art performance in both future point cloud prediction and 3D scene understanding by unifying these tasks within a single driving world model.

Xin Zhou, Dingkang Liang, Xiwu Chen +4

Computer Vision Robotics & Embodied AI World Models & Planning

Apr 21, 2026

Huazhong Agricultural UniversityApr 21, 2026·also HUST

DINO Eats CLIP: Adapting Beyond Knowns for Open-set 3D Object Retrieval

DINO, not CLIP, might be the better foundation for open-set 3D object retrieval, especially when paired with dynamic view integration and virtual feature synthesis to avoid overfitting.

Xinwei He, Yansong Zheng, Qianru Han +7

Computer Vision Multimodal Models Recommendation & Information Retrieval

Apr 15, 2026

Yuanlei Zheng +9Apr 15, 2026·also Aston

Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA

Doc-V* demonstrates that an agentic approach to multi-page document VQA, using active navigation and structured memory, can significantly outperform retrieval-augmented generation, especially in out-of-domain scenarios.

Yuanlei Zheng, Pei Fu, Hang Li +7

Multimodal Models Reasoning & Chain-of-Thought Tool Use & Agents

Apr 9, 2026

Zhengyang Sun +7Apr 9, 2026

When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

Text-to-video diffusion models can now count (more accurately) without retraining, thanks to a clever attention-based guidance method.

Zhengyang Sun, Zheng‐Wei Sun, Yu Chen +5

Computer Vision Multimodal Models

Apr 6, 2026

PointTPA: Dynamic Network Parameter Adaptation for 3D Scene Understanding

Achieve state-of-the-art 3D scene understanding by dynamically adapting network parameters at test time, proving that input-aware adjustments can significantly boost performance with minimal overhead.

Chaoqun Zheng, Dingkang Liang, Xiang Bai

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Mar 26, 2026

Kai Chen +6Mar 26, 2026

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

World models can now remember and realistically regenerate dynamic objects that temporarily disappear from view, thanks to a novel hybrid memory architecture.

Kai Chen, Dingkang Liang, Xin Zhou +4

Computer Vision Multimodal Models World Models & Planning

Search

Xiang Bai

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (8)