Yuanda Xu

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Reasoning & Chain-of-Thought (3)Inference & Quantization (3)Training Efficiency & Optimization (3)RLHF & Preference Learning (2)

Frequent co-authors

Hejian Sang (5)Zhengze Zhou (4)Zhipeng Wang (4)Ran He (3)

Papers (6)

Jul 6, 2026

Yuanda Xu +132w ago·also LinkedIn Corporation

TREK: Distill to Explore, Reinforce to Refine

TREK transforms the way models tackle challenging prompts by expanding their exploration support, leading to substantial performance gains even in the hardest task scenarios.

Yuanda Xu, Zhengze Zhou, Kayhan Behdin +11

Reasoning & Chain-of-Thought RLHF & Preference Learning

Apr 15, 2026

Yuanda Xu +4Apr 15, 2026·also LinkedIn Corporation

TIP: Token Importance in On-Policy Distillation

Overconfident tokens, often missed by entropy-based methods, carry surprisingly dense corrective signals in on-policy distillation, allowing for near-baseline performance with <10% of tokens.

Yuanda Xu, Hejian Sang, Ran He +2

Inference & Quantization Training Efficiency & Optimization

Mar 11, 2026

Mar 11, 2026·also IBM Research

RAGPerf: An End-to-End Benchmarking Framework for Retrieval-Augmented Generation Systems

Pinpointing performance bottlenecks in RAG pipelines just got easier: RAGPerf offers a modular benchmarking framework to dissect and optimize each component.

Shaobo Li, Y. Zhou, Yuanda Xu +5

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Yuanda Xu +4Mar 11, 2026

PACED: Distillation at the Frontier of Student Competence

Stop wasting compute on easy and impossible examples: PACED distillation focuses your student model's training on the sweet spot where it actually learns.

Yuanda Xu, Hejian Sang, Zhengze Zhou +2

Inference & Quantization Training Efficiency & Optimization

Mar 5, 2026

Hejian Sang +6Mar 5, 2026

On-Policy Self-Distillation for Reasoning Compression

Reasoning models aren't just verbose, they're actively *harmed* by their own verbosity, but a simple self-distillation trick can compress their outputs by up to 59% while boosting accuracy by up to 16 points.

Hejian Sang, Yuanda Xu, Zhengze Zhou +4

Inference & Quantization Reasoning & Chain-of-Thought Training Efficiency & Optimization

Feb 24, 2026

Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning

Overconfident errors in RLVR monopolize probability mass and suppress exploration, but a confidence-aware penalty fixes this and boosts mathematical reasoning performance.