Nan Duan

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (4)Computer Vision (3)Inference & Quantization (2)Training Efficiency & Optimization (2)

Frequent co-authors

Haoyang Huang (4)Hang Xu (2)Zeyue Xue (2)Siming Fu (2)

Papers (7)

May 5, 2026

JD Explore AcademyMay 5, 2026·also CAS, HKU, K &, Robotics +2

Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

Bidirectional interaction between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables a unified multimodal model to achieve spatial intelligence beyond general visual competence.

Lin Song, Guoqing Ma, Bo Wang +14

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Apr 28, 2026

Zeyue Xue +11Apr 28, 2026

A Systematic Post-Train Framework for Video Generation

Unlock the full potential of your pretrained video diffusion models with a surprisingly simple four-stage post-training framework that drastically improves visual quality, temporal coherence, and instruction following.

Zeyue Xue, Siming Fu, Jie Huang +9

Computer Vision Inference & Quantization Training Efficiency & Optimization

Apr 22, 2026

Apr 22, 2026·also BAAI

Near-Future Policy Optimization

Forget external teachers – the best way to boost your RL model's performance is to learn from its future self.

Chuanyu Qin, Chenxu Yang, Chen Yang +9

RLHF & Preference Learning Training Efficiency & Optimization

Tianle Zhang +58Apr 22, 2026·also BC Cancer Agency, JD Group JD Technology, Northeastern, UBC +1

JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

Bridging the gap between human manipulation and robotic control, JoyAI-RA unlocks enhanced cross-embodiment behavior learning through multi-source pretraining.

Tianle Zhang, Zhihao Yuan, Dafeng Chi +56

Data Curation & Synthetic Data Multimodal Models Robotics & Embodied AI

Apr 18, 2026

EasyVideoR1: Easier RL for Video Understanding

EasyVideoR1 achieves a 1.47 times throughput improvement in video understanding tasks by eliminating redundant video decoding and leveraging a comprehensive task-aware reward system.

Chuanyu Qin, Chenxu Yang, Qingyi Si +4

Computer Vision Multimodal Models RLHF & Preference Learning

Apr 8, 2026

Jianhui Liu +13Apr 8, 2026·also Huawei

OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence

Spatial reasoning gets a major boost: OpenSpatial-3M, a new dataset, enables models to leapfrog existing benchmarks by 19%.

Jianhui Liu, Haoze Sun, Wenbo Li +11

Data Curation & Synthetic Data Open-Source Models & Weights Robotics & Embodied AI

Mar 12, 2026

Mar 12, 2026·also NVIDIA, SJTU, ZJU

OmniForcing: Unleashing Real-time Joint Audio-Visual Generation

Achieve real-time, synchronized audio-visual generation at 25 FPS by distilling a bidirectional diffusion model into a fast, autoregressive architecture, overcoming training instability with novel alignment and token handling techniques.

Yaofeng Su, Yuming Li, Yuming Li +8

Inference & Quantization Multimodal Models Speech & Audio

Search

Nan Duan

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (7)