Ming-Yu Liu

AutoVSR achieves up to 59.45% higher accuracy in generating symbolic expressions from circuit schematics, revolutionizing the way we interpret circuit behavior.

Zhe Xiao, Longfei Li, Xu He +2

Multimodal Models Reasoning & Chain-of-Thought

Jun 26, 2026

Jun 26, 2026·also NVIDIA, Tencent AI

PhysisForcing: Physics Reinforced World Simulator for Robotic Manipulation

Physically aligned video models can boost robotic manipulation success rates by over 50% compared to traditional methods.

Peiwen Zhang, Yufan Deng, Shangkun Sun +8

Robotics & Embodied AI World Models & Planning

Jun 24, 2026

Tsinghua AIJun 24, 2026·also SJTU, UT Austin

Causal-rCM: A Unified Teacher-Forcing and Self-Forcing Open Recipe for Autoregressive Diffusion Distillation in Streaming Video Generation and Interactive World Models

Teacher-forcing consistency models can accelerate autoregressive video generation by ten times, revolutionizing the training landscape for streaming applications.

Kaiwen Zheng, Guande He, Min Zhao +7

Computer Vision World Models & Planning

Jun 17, 2026

Jun 17, 2026·also AI2, BAIR, Stanford HAI, PI +1

SC3-Eval: Evaluating Robot Foundation Models via Self-Consistent Video Generation

SC3-Eval achieves a remarkable 0.929 Pearson correlation in evaluating robot policies, revealing critical insights into their real-world performance.

Wei-Cheng Tseng, Gashon Hussein, Yuzhu Dong +9

Robotics & Embodied AI World Models & Planning

Jun 14, 2026

Jun 14, 2026·also Fudan, HKUST, NJU, Shanghai Innovation

Perfect Demo Makes Poor Teacher: Learning Robust Alignment from Critical Motion Segments

Fluent demonstrations may mislead robot learning, but a new representation method recovers critical motion insights, boosting performance significantly.

Ming-Yu Liu, Zeju Li, Jiuhe Shu +1

Robotics & Embodied AI

May 28, 2026

Tingle Li +8May 28, 2026

Benchmarking Single-Factor Physical Video-to-Audio Generation

V2A models prioritize text captions over visual cues when generating audio, resulting in physically plausible but often temporally misaligned sounds.

Tingle Li, Siddharth Gururani, Kevin J. Shih +6

Eval Frameworks & Benchmarks Multimodal Models Speech & Audio

Apr 22, 2026

Apr 22, 2026·also University of California

CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs

Video LLMs can ace individual traffic video questions but still fail spectacularly at subtle counterfactual reasoning, revealing a critical blind spot for safety-critical applications.

Xingcheng Zhou, Hao Guo, Walter Zimmer +3

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Apr 21, 2026

Hao Li +31Apr 21, 2026·also CAS, HIT, OPPO, PolyU +2

LoViF 2026 Challenge on Real-World All-in-One Image Restoration: Methods and Results

Unified benchmarks reveal the state-of-the-art in simultaneously addressing multiple real-world image degradations like blur, low-light, and rain.

Hao Li, Naiwei Chen, Shengyuan Li +29

Computer Vision Eval Frameworks & Benchmarks

Apr 13, 2026

NVIDIAApr 13, 2026·also IIT Delhi, Indraprastha Institute of Information, Jaypee Institute of Information, UMD

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

Audio-language models can now reason about 30-minute-long audio clips with timestamp-grounded intermediate steps, unlocking a new level of fine-grained understanding.

Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar +17

Multimodal Models Open-Source Models & Weights Speech & Audio

Search

Ming-Yu Liu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (10)