Chang Xu

Pruning VLA models can reduce parameters by up to 30% while retaining 90% of performance, challenging the notion of parameter redundancy in these complex systems.

Fengnian Zhang, Tao Huang, Siyu Xu +2

Inference & Quantization Multimodal Models Training Efficiency & Optimization

May 26, 2026

May 26, 2026·also Corresponding author, CUHK

PARE: Pruning and Adaptive Routing for Efficient Video Generation

By intelligently pruning attention heads based on their spatial or temporal roles and adaptively routing denoising steps through the network, PARE achieves significant computational savings in video generation without sacrificing quality.

Yunke Wang, Yu Qiao, Yaohui Wang +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

May 26, 2026·also Corresponding author

BEAT: Rhythm-Elastic Alignment for Agentic Music-guided Movie Trailer Generation

Forget rigid shot-music mappings: BEAT's elastic alignment framework finally captures the dynamic rhythm of professional movie trailer editing.

Yunke Wang, Xinyuan Chen, Chang Xu

Multimodal Models Speech & Audio Tool Use & Agents

May 21, 2026

May 21, 2026·also HKU, Normal University

Sibyl-AutoResearch: Autonomous Research Needs Self-Evolving Trial-and-Error Harnesses, Not Paper Generators

Autonomous research agents need to learn from their mistakes and adapt, not just generate papers, and this framework shows how to make that happen.

Chengcheng Wang, Qinhua Xie, Jianyuan Guo +1

Scientific Discovery & Drug Design Tool Use & Agents

Apr 22, 2026

Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing

Task-aware localization, using attention cues from both source and target image streams, significantly reduces over-editing in instruction-based image editing, even when applied to strong diffusion transformer backbones.

Xiyu Wang, Yunke Wang, Chang Xu

Computer Vision Multimodal Models Natural Language Processing

Apr 8, 2026

Yuheng Shi +7Apr 8, 2026·also HKU

Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models

MLLMs can achieve 4x faster inference without sacrificing accuracy by intelligently focusing on only the image regions relevant to the query.

Yuheng Shi, Xiaohuan Pei, Xiaohuan Pei +5

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Multimodal Models

Search

Chang Xu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (7)