Shihan Dou

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Tool Use & Agents (2)Data Curation & Synthetic Data (2)Eval Frameworks & Benchmarks (2)Red-Teaming & Adversarial Robustness (1)

Frequent co-authors

Tao Gui (4)Xuanjing Huang (4)Zhiheng Xi (2)Xiaohua Wang (1)

Papers (5)

Apr 15, 2026

2w ago

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Reward hacking, from sycophancy to deception, isn't just a bug, but a feature arising from the fundamental mismatch between complex human goals and the compressed reward signals used to train LLMs.

Xiaohua Wang, Muzhao Tian, Yuqi Zeng +22

Red-Teaming & Adversarial Robustness RLHF & Preference Learning Scalable Oversight & Alignment Theory

Jiahang Lin +162w ago·also Fudan

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning

Multi-turn reinforcement learning gets a boost: weighting trajectories by semantic similarity dramatically improves baseline estimation and agent performance in long-document visual QA.

Jiahang Lin, Kai Hu, Binghai Wang +14

Multimodal Models Recommendation & Information Retrieval Tool Use & Agents

Apr 9, 2026

Yanling Xiao +103w ago·also Xi'an Jiaotong-Liverpool University

A Decomposition Perspective to Long-context Reasoning for LLMs

Forget end-to-end training: breaking down long-context reasoning into atomic skills and training on targeted pseudo-data unlocks a 7.7% performance boost.

Yanling Xiao, Huaibing Xie, Guoliang Zhao +8

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Feb 13, 2026

Feb 13, 2026·also Fudan

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents

GPT-5's scientific reasoning skills plummet by nearly 50% when tackling multi-step workflows, revealing a critical gap in current LLM agents' ability to orchestrate complex tool use.

Yujiong Shen, Yajie Yang, Zhiheng Xi +12

Eval Frameworks & Benchmarks Scientific Discovery & Drug Design Tool Use & Agents

Jan 7, 2026

Google ResearchJan 7, 2026·also Fudan, HuggingFace

Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control

Finally, a fully open-source, reproducible system for long-form song generation is here, complete with licensed data, code, and a Qwen-based model that rivals closed-source systems.

Changhao Jiang, Jiahao Chen, Zhenghao Xiang +14

Data Curation & Synthetic Data Open-Source Models & Weights Speech & Audio

Search

Shihan Dou

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)