Yuanming Li

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (5)Computer Vision (4)Inference & Quantization (1)Reasoning & Chain-of-Thought (1)

Frequent co-authors

Ce Chen (3)Yi Ren (3)Zujin Guo (3)V. Goriachko (2)

Papers (5)

Jun 17, 2026

Pengyu Li +53w ago

Visual-OPSD: Cross-Modal On-Policy Self-Distillation for Efficient Unified Multimodal Reasoning

Visual-OPSD achieves a remarkable 14.3x speedup while boosting accuracy by over 3 percentage points, revealing the untapped potential of reasoning in visual thought generation.

Pengyu Li, Zhitao Gao, Lingling Zhang +3

Inference & Quantization Multimodal Models Reasoning & Chain-of-Thought

Jun 14, 2026

MIT CSAILJun 14, 2026·also La Trobe ∗Equal contribution.

SpatialAvatar-0: High-Quality 4D Head Avatar with Multi-Stage Reconstruction

Surpassing existing benchmarks, SpatialAvatar-0 achieves superior 4D head avatar quality with up to 60x fewer training iterations than traditional methods.

Yiran Wang, Yiran Wang, Zeyu Zhang +7

Computer Vision Multimodal Models

Jun 11, 2026

Benjamin Liang +21Jun 11, 2026

Avatar V: Scaling Video-Reference Avatar Video Generation

Avatar V achieves unprecedented avatar video fidelity by directly conditioning on full video references, capturing dynamic behaviors that previous methods missed.

Benjamin Liang, Ce Chen, Desmond Lin +19

Computer Vision Multimodal Models

Apr 30, 2026

Zujin Guo +6Apr 30, 2026·also S-Lab

Generate Your Talking Avatar from Video Reference

Ditch the static image: this method generates realistic talking avatars by learning from *videos* of the subject in completely different scenes.

Zujin Guo, Zhenhui Ye, Yi Ren +4

Computer Vision Multimodal Models Speech & Audio

Ce Chen +8Apr 30, 2026·also HeyGen Research

TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions

Injecting optical flow into VLMs lets them spot subtle video transitions that other methods miss, opening the door to more robust video understanding.

Ce Chen, Yi Ren, Yuanming Li +6

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Search

Yuanming Li

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)