Yuxin Peng

Peking University

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (7)Computer Vision (5)Eval Frameworks & Benchmarks (3)Reasoning & Chain-of-Thought (2)

Frequent co-authors

Minghang Zheng (2)Zihao Yin (2)Yang Liu (2)Ting Lei (1)

Papers (7)

Jul 15, 2026

1w ago·also Tsinghua AI, UW-Madison

Unleashing Multimodal Large Language Models for Training-free HOI Detection in the Wild

Training-free HOI detection achieves superior performance by harnessing the multimodal reasoning of foundation models, challenging the need for dataset-specific supervision.

Ting Lei, Jialin Liu, Zhu Xu +1

Computer Vision Multimodal Models

Jun 25, 2026

DiCoBench: Benchmarking Multi-Image Fine-Grained Perception via Differential and Commonality Visual Cues

MLLMs struggle to match human accuracy in fine-grained perception, with a striking performance gap revealed by the new DiCoBench benchmark.

Yuxin Peng

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Jun 17, 2026

Peking Univer- sityJun 17, 2026·also Tsinghua AI, National Institute of Health Data, PKU

Taming I2V models for Image HOI Editing: A Cognitive Benchmark and Agentic Self-Correcting Framework

I2V models not only excel at dynamic editing but also provide a unique lens for diagnosing errors in Human-Object Interaction tasks.

Jiayi Gao, Qingchao Chen, Yuxin Peng

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Hong-Tao Yu +3Jun 17, 2026·also PKU

Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: From Evaluation to Diagnosis

Current LVLMs are inadequate at fine-grained image recognition, revealing critical bottlenecks in visual and semantic processing that need urgent attention.

Hong-Tao Yu, Yuxin Peng, Serge Belongie +1

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Jun 8, 2026

Wangxuan Institute of ComputerJun 8, 2026·also Tsinghua AI, PKU

Temporal-Aware Reasoning Optimization for Video Temporal Grounding

Superficial reasoning in video temporal grounding can be transformed into high-quality, time-aware insights with the right optimization framework.

Minghang Zheng, Zihao Yin, Yuxin Peng +1

Multimodal Models Reasoning & Chain-of-Thought RLHF & Preference Learning

May 21, 2026

Tianxiang Du +2May 21, 2026·also PKU

AesFormer: Transform Everyday Photos into Beautiful Memories

You can now automatically transform structurally flawed photos into aesthetically pleasing images, thanks to a new framework that plans and executes edits based on photographic principles.

Tianxiang Du, Hulingxiao He, Yuxin Peng

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Apr 28, 2026

Wangxuan Institute of ComputerApr 28, 2026·also Tsinghua AI, Huawei, PKU

OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding

MLLMs are better at understanding videos than directly grounding text queries within them, and a self-correction training loop can close the gap.

Minghang Zheng, Zihao Yin, Yi Yang +3

Data Curation & Synthetic Data Multimodal Models Reasoning & Chain-of-Thought

Search

Yuxin Peng

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (7)