Xiaojuan Qi

Papers on Lattice

Total citations

Topics

h-index

Research focus

Multimodal Models (3)Computer Vision (2)Architecture Design (Transformers, SSMs, MoE) (1)Robotics & Embodied AI (1)Tool Use & Agents (1)

Frequent co-authors

Sudong Wang (2)Shijian Lu (2)Lidong Bing (2)Shihao Han (1)

Papers (3)

May 28, 2026

Shihao Han +4May 28, 2026

Veda: Scalable Video Diffusion via Distilled Sparse Attention

Surprisingly, high sparsity in video diffusion models doesn't degrade generation quality if the sparse mask accurately mimics the tile-wise geometry of full attention.

Shihao Han, Xinting Hu, Xiaofeng Mei +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

May 19, 2026

May 19, 2026·also evolvinglmms-lab.github.io/ParaVT, HKUST

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

RL fine-tuning LMMs for tool use can collapse structural formats due to strong pretrained tool priors, but a surprisingly simple fix of targeted format rewards and frame-budget randomization can restore stability and boost performance.

Zuhao Yang, Kaichen Zhang, Sudong Wang +6

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Apr 30, 2026

DAMOApr 30, 2026·also HKUST, NTU

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Today's visual generation models are often evaluated on the wrong things, leading to inflated performance claims that mask critical failures in spatial reasoning, temporal consistency, and causal understanding.

Keming Wu, Zuhao Yang, Kaichen Zhang +28

Computer Vision Multimodal Models World Models & Planning

Search

Xiaojuan Qi

Research focus

Frequent co-authors

Papers (3)