Yuhan Wang

Stop hand-feeding your LLM clinical data: ClinSeekAgent actively seeks and synthesizes multimodal evidence, boosting Claude Opus's performance by 15% on multimodal tasks.

Juncheng Wu, Letian Zhang, Yuhan Wang +5

Multimodal Models Reasoning & Chain-of-Thought Tool Use & Agents

Apr 16, 2026

School of Artificial IntelligenceApr 16, 2026·also CAS, Hangzhou Medical College, SJTU, UCSC

PRL-Bench: A Comprehensive Benchmark Evaluating LLMs'Capabilities in Frontier Physics Research

LLMs are still far from being autonomous scientists, failing to master even simplified, end-to-end physics research workflows.

Tingjia Miao, Wenkai Jin, Muhua Zhang +15

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Scientific Discovery & Drug Design

Mar 17, 2026

UC Santa CruzMar 17, 2026·also BAIR, UCSC, UNC

Kestrel: Grounding Self-Refinement for LVLM Hallucination Mitigation

LVLMs can be made significantly less prone to hallucinations, without any training, by explicitly grounding them in visual evidence and iteratively self-refining their answers based on verified information.

Haoqin Tu, Yuhan Wang, Zeyu Zheng +1

Eval Frameworks & Benchmarks Multimodal Models

Mar 16, 2026

Mar 16, 2026·also CUHK

Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness

Current reward models for spoken dialogue systems are missing crucial paralinguistic and natural speech elements, but this new model closes the gap by operating directly on speech and outperforming existing audio LLMs.

Yuhan Wang, Fan Zhuo, Xize Cheng +6

Natural Language Processing RLHF & Preference Learning Speech & Audio

Mar 4, 2026

Mar 4, 2026·also CUHK

Training-Free Rate-Distortion-Perception Traversal With Diffusion

Achieve adaptive, perception-aware image compression without any training by simply steering a pre-trained diffusion model.

Yuhan Wang, Suzhi Bi, Ying-Jun Angela Zhang

Computer Vision Inference & Quantization Training Efficiency & Optimization

Apr 2, 2025

UC Santa CruzApr 2, 2025·also Mila, CUHK, UCSC

STAR-1: Safer Alignment of Reasoning LLMs with 1K Data

Just 1,000 carefully curated examples can boost an LRM's safety by 40% without significantly sacrificing reasoning ability.