Guang Chen

Transient distractions can severely degrade scene reconstruction, but MU-GeNeRF effectively mitigates their impact, achieving results on par with specialized methods.

Wenjie Mu, Chuanzhou Su, Xuanyi Shen +4

Computer Vision Multimodal Models

Apr 20, 2026·also BAAI, Rimbot, Xiaomi EV

XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments

Endowing VLMs with intrinsic 3D geometric awareness and physical interaction cues via XEmbodied substantially boosts performance on spatial reasoning and embodied tasks, surpassing existing 2D image-text pretrained models.

Kangan Qian, ChuChu Xie, Yang Zhong +11

Computer Vision Multimodal Models Robotics & Embodied AI

Jinghui Lu +51Apr 20, 2026·also CAS, Drive. We further evaluate zero-shot, HKU, NJU +2

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

Latent reasoning can beat explicit Chain-of-Thought – but only if you force it to learn causal dynamics via a visual world model, not just language.

Jinghui Lu, Jiayi Guan, Zhijian Huang +49

Multimodal Models Reasoning & Chain-of-Thought World Models & Planning

Apr 14, 2026

Sunyao Zhou +8Apr 14, 2026·also Fudan, Shanghai Innovation, Xiaomi EV, ZJU

DeCoNav: Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation

Forget static coordination – robots that chat and dynamically re-plan can achieve a whopping 69% improvement in collaborative navigation success.

Sunyao Zhou, Yunzi Wu, Tianhang Wang +6

Computer Vision Multimodal Models Robotics & Embodied AI

Apr 5, 2026

Apr 5, 2026·also Microsoft Research, Cambridge, Corresponding author, Drive. We further evaluate zero-shot +1

DriveVA: Video Action Models are Zero-Shot Drivers

Autonomous driving models can now achieve remarkable zero-shot generalization by leveraging the power of large-scale video generation models to jointly predict future actions and visuals.

Mengmeng Liu, Diankun Zhang, Jiuming Liu +4

Computer Vision Robotics & Embodied AI World Models & Planning

Apr 2, 2026

Yongkang Li +14Apr 2, 2026·also Drive. We further evaluate zero-shot, Xiaomi EV

UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving

Autonomous driving models no longer need to compromise between spatial perception and semantic reasoning: UniDriveVLA's expert decoupling unlocks state-of-the-art performance across a range of driving tasks.

Yongkang Li, Lijun Zhou, Sixu Yan +12

Multimodal Models Robotics & Embodied AI World Models & Planning

Search

Guang Chen

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (7)