Hao Liu

Existing video datasets fail to capture the complexity of human interactions in diverse scenes, but OmniHuman offers a new benchmark to train and evaluate models on more realistic human-centric video generation.

Xing Cai, Yingjie Chen, Yiheng Li +3

Computer Vision Data Curation & Synthetic Data Multimodal Models

Apr 16, 2026

Andrey Moskalenko +661w ago·also Central South University, TTIC

NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results

Over 20 teams vied to decode human attention in video, revealing new insights into saliency prediction techniques.

Andrey Moskalenko, Andrey Moskalenko, Alexey Bryncev +64

Computer Vision Data Curation & Synthetic Data Eval Frameworks & Benchmarks

Apr 10, 2026

Midea Group2w ago·also BJTU, Central South University, DUT, NUDT

ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion

Achieve an 8x speedup in chest X-ray report generation without sacrificing clinical accuracy by distilling multi-step diffusion into a single, efficient step.

Lifeng Chen, Tianqi You, Hao Liu +6

Computer Vision Inference & Quantization Multimodal Models

Apr 8, 2026

Han Zou +292w ago·also Central South University

Telecom World Models: Unifying Digital Twins, Foundation Models, and Predictive Planning for 6G

Telecom World Models fuse the flexibility of LLMs with the fidelity of Digital Twins, enabling uncertainty-aware predictive planning that existing approaches can't match.

Han Zou, Hang Zou, Yuzhi Yang +27

Natural Language Processing Tool Use & Agents World Models & Planning

Apr 6, 2026

Central South University2w ago

Beyond Few-Step Inference: Accelerating Video Diffusion Transformer Model Serving with Inter-Request Caching Reuse

Chorus unlocks 45% speedups in video diffusion inference by cleverly reusing computations across user requests, even in highly optimized 4-step models where traditional caching fails.

Hao Liu, Chenghuan Huang, Zhenyi Zheng +2

Computer Vision Distributed Systems & Hardware Inference & Quantization

Mar 18, 2026

WeChat LabMar 18, 2026·also Central South University, NJU, Tencent AI

Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation

Finally, a unified framework lets you control both facial appearance and voice timbre for personalized audio-video generation across multiple identities.

Yingjie Chen, Shilun Lin, Cai Xing +3

Computer Vision Multimodal Models Speech & Audio

Mar 17, 2026

S. UniversityMar 17, 2026·also Central South University, NJU, Z. University

SoK: Systematizing Software Artifacts Traceability via Associations, Techniques, and Applications

Software traceability research is severely imbalanced, with code-related links dominating and 95% of tools stuck in academia.

Zhifei Chen, Lata Yi, Liming Nie +9

Code Generation & Program Synthesis Natural Language Processing

Mar 13, 2026

Mar 13, 2026·also Central South University, HIT, HUST, Imperial +3

Multimodal OCR: Parse Anything from Documents

Forget treating document graphics as mere pixels: this new OCR system parses them into reusable code, unlocking multimodal supervision and outperforming existing systems.

Handong Zheng, Kaile Zhang, Liang Xin +16

Computer Vision Multimodal Models Natural Language Processing

Mar 4, 2026

Central South UniversityMar 4, 2026·also NJU

InstMeter: An Instruction-Level Method to Predict Energy and Latency of DL Model Inference on MCUs

Forget MACs and parameters: accurately predict DL model energy and latency on MCUs with 3x and 6.5x lower error using just clock cycles.

Hao Liu, Qing Wang, Marco Zuniga

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Feb 24, 2026

Feb 24, 2026·also CAS, Central South University, China Mobile

MemoPhishAgent: Memory-Augmented Multi-Modal LLM Agent for Phishing URL Detection

By dynamically orchestrating tools and recalling past reasoning, an LLM agent can boost phishing detection recall by 20% on real-world social media URLs.

Xuan Chen, Hao Liu, Yuan Tao +3

Multimodal Models Red-Teaming & Adversarial Robustness Tool Use & Agents

Feb 19, 2026

Feb 19, 2026·also Central South University, NJU

4D Monocular Surgical Reconstruction under Arbitrary Camera Motions

Reconstructing surgical scenes from monocular endoscope videos with large camera motion just got a whole lot better, thanks to a new window-based approach that doesn't need stereo depth or perfect camera tracking.

Jiwei Shan, Cheng-Tai Hsieh, Yirui Li +3

Computer Vision Robotics & Embodied AI

Feb 17, 2026

Feb 17, 2026·also Central South University, SYSU, UVA

AgriWorld:A World Tools Protocol Framework for Verifiable Agricultural Reasoning with Code-Executing LLM Agents

LLMs can now reason effectively about complex agricultural scenarios by iteratively writing and executing code within a specialized environment, outperforming traditional text-based approaches.

Zhixing Zhang, Jesen Zhang, Hao Liu +2

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

Feb 16, 2026

Feb 16, 2026·also AI2, CAS, Central South University, China Mobile +2

DM0: An Embodied-Native Vision-Language-Action Model towards Physical AI

Forget fine-tuning: DM0 shows that pretraining a VLA model from scratch on diverse embodied and non-embodied data leads to SOTA performance in physical AI tasks.

Jianjian Sun, Kangheng Lin, Ruitao Zhang +31

Computer Vision Multimodal Models Robotics & Embodied AI

Central South UniversityFeb 16, 2026·also NJU

GradMAP: Faster Layer Pruning with Gradient Metric and Projection Compensation

LLMs can be pruned 4x faster without sacrificing performance thanks to a new gradient-based metric and projection compensation technique.

Hao Liu, Guangyan Li, Wensheng Zhang +1

Inference & Quantization Training Efficiency & Optimization

Search

Hao Liu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (15)