Wenbo Su

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

RLHF & Preference Learning (2)Training Efficiency & Optimization (1)Reasoning & Chain-of-Thought (1)Eval Frameworks & Benchmarks (1)

Frequent co-authors

Yuchi Xu (4)Yancheng He (2)Weixun Wang (2)Xiaoyang Li (2)

Papers (6)

Jun 28, 2026

Jing Liang +11Jun 28, 2026

The Mirage of Optimizing Training Policies: Monotonic Inference Policies as the Real Objective for LLM Reinforcement Learning

Training updates that improve performance in LLMs can actually degrade inference quality—unless you use the new Monotonic Inference Policy Update framework.

Jing Liang, Hongyao Tang, Yi Ma +9

RLHF & Preference Learning Training Efficiency & Optimization

Jun 9, 2026

DAMOJun 9, 2026·also HIT, Shanghai AI Lab, SJTU

How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs

FlowTracer reveals that optimizing token-level rewards based on attention-induced information flow can dramatically enhance reasoning performance in LLMs.

Zhichen Dong, Yuhan Sun, Zinian Peng +5

Reasoning & Chain-of-Thought RLHF & Preference Learning

Jun 1, 2026

Jun 1, 2026·also PKU, Zhongguancun Laboratory

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

TVIR-Agent reveals that integrating visual elements into report generation can dramatically improve the quality and reliability of analytical outputs.

Xinkai Ma, Zhiqi Bai, Dingling Zhang +21

Eval Frameworks & Benchmarks Multimodal Models

Apr 15, 2026

You Wu +6Apr 15, 2026·also Tsinghua AI

YOCO++: Enhancing YOCO with KV Residual Connections for Efficient LLM Inference

YOCO++ proves you can halve the KV cache size in LLMs and still beat a standard Transformer, thanks to a clever residual connection trick.

You Wu, Ziheng Chen, Yizhen Zhang +4

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization

Jan 4, 2026

DAMOJan 4, 2026·also Fudan, Tongji

Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement

Failure-driven post-training, combined with a meticulously curated 10M token STEM dataset, unlocks a 4.68% performance boost in LLM reasoning, proving that strategic data synthesis around model weaknesses is a powerful path to improvement.

Mingyu Xu, Cheng Fang, Keyue Jiang +16

Dec 31, 2025

Dec 31, 2025·also CAS, ECNU, Fudan, GIST Guangdong +6

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

An open-source ecosystem for agentic learning, complete with a trained agent and novel policy optimization, promises to accelerate research by providing a standardized, scalable platform.

Weixun Wang, Xiaoxiao Xu, Wanhe An +85

Open-Source Models & Weights Tool Use & Agents

Search

Wenbo Su

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (6)