Zhanyu Ma

Beijing University of Posts and Telecommunications, Beijing Key Laboratory of Multimodal Data Intelligent Perception and Governance

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (3)Eval Frameworks & Benchmarks (2)Computer Vision (1)Training Efficiency & Optimization (1)

Frequent co-authors

Kongming Liang (2)Zilu Zhou (1)Dongliang Chang (1)Junhan Chen (1)

Papers (4)

Jul 20, 2026

6d ago·also Beijing Key Laboratory of Multimodal

FlexiGrad: Adaptive Gradient Modulation for Hierarchical Fine-Grained Classification

FlexiGrad eliminates harmful gradient conflicts, leading to more stable training and improved accuracy in hierarchical fine-grained classification tasks.

Zilu Zhou, Dongliang Chang, Junhan Chen +1

Computer Vision Training Efficiency & Optimization

Jul 15, 2026

Songyu Xu +81w ago·also Beijing Key Laboratory of Multimodal, BUPT, Corresponding author, NII

VGIF-Score: Interpretable and Diagnostic Evaluation of Spatio-Temporal Instruction Following in Video Generation

VGIF-Score reveals that current video generation models struggle with complex instructions, providing a diagnostic lens to pinpoint where they succeed or fail.

Songyu Xu, Xin Wang, Qiang Chen +6

Eval Frameworks & Benchmarks Multimodal Models

Jul 2, 2026

Yuanzhi Liu +43w ago·also Beijing Key Laboratory of Multimodal, BUPT

MMBench-Live: A Continuously Evolving Benchmark for Multimodal Models

MMBench-Live achieves a high answer correctness rate while updating benchmarks at a fraction of the cost and time, revolutionizing how we assess VLMs.

Yuanzhi Liu, Shousheng Zhao, Bo Zhou +2

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Multimodal Models

Jun 9, 2026

Jun 9, 2026·also Beijing Key Laboratory of Multimodal, BUPT, Corresponding authors, Meituan

AuRA: Internalizing Audio Understanding into LLMs as LoRA

AuRA achieves superior performance in speech-language tasks by seamlessly integrating audio understanding into LLMs, outperforming traditional methods in both speed and accuracy.

Bo Cheng, Zhanyu Ma, Yuan Wu +3

Multimodal Models Speech & Audio

Search

Zhanyu Ma

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (4)