Qi Gu

Skill0.5 achieves state-of-the-art out-of-distribution generalization in agentic RL by intelligently combining skill internalization and utilization, outperforming methods that rely solely on one or the other.

Jiapeng Zhu, Jianxiang Yu, Yibo Zhao +6

RLHF & Preference Learning Robotics & Embodied AI Tool Use & Agents

3w ago·also CUHK, Eastern Institute of Technology, SJTU, ZJU

GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection

GUI agents can learn world knowledge more efficiently by internalizing causal relationships during mid-training, rather than relying on implicit learning through action annotations or reward signals in post-training.

Zhengxi Lu, Yanyu Chen, Qi Gu +2

Multimodal Models Tool Use & Agents

May 26, 2026

3w ago·also NUS, Tsinghua AI, BUPT, Meituan +2

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

Current LLM agents still struggle to infer and leverage user preferences from fragmented, real-world interactions, revealing a substantial gap between their capabilities and the demands of personalized decision-making.

Yuxin Chen, Yi Zhang, Zhengzhou Cai +8

Eval Frameworks & Benchmarks Recommendation & Information Retrieval Tool Use & Agents

3w ago·also NUS, Tsinghua AI, Meituan, TJU +1

Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments

LLM agents trained with simulated user and tool noise not only become more robust in messy real-world environments, but also surprisingly improve on clean, idealized benchmarks.

Yuxin Chen, Xiaodong Cai, Junfeng Fang +6

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Apr 29, 2026

Tianhao Hu +14Apr 29, 2026

DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training

Asynchronous RL for LLMs doesn't have to sacrifice convergence for speed: DORA achieves 2-4x faster training by cleverly managing multiple policy versions during rollout.

Tianhao Hu, Xiangcheng Liu, Youshao Xiao +12

Distributed Systems & Hardware RLHF & Preference Learning Training Efficiency & Optimization

Apr 20, 2026

Wentao Shi +13Apr 20, 2026

AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation

Agent-as-a-Judge can outperform LLM-as-a-Judge in complex environments, but still struggles to reliably verify agent behavior, revealing a critical gap in current LLM-based agent evaluation.

Wentao Shi, Yu Wang, Yuyang Zhao +11

Eval Frameworks & Benchmarks Tool Use & Agents

Apr 2, 2026

Zhengxi Lu +12Apr 2, 2026·also Meituan, ZJU

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

LLM agents can internalize skills via in-context RL, achieving zero-shot autonomous behavior without the token overhead and retrieval noise of traditional methods.

Zhengxi Lu, Zhiyuan Yao, Zhiyuan Yao +10

RLHF & Preference Learning Tool Use & Agents Training Efficiency & Optimization

Mar 11, 2026

Yikai Zhang +6Mar 11, 2026

$V_{0.5}$: Generalist Value Model as a Prior for Sparse RL Rollouts

Forget hand-tuning rollout budgets: $V_{0.5}$ dynamically allocates compute to sparse RL rollouts based on a real-time statistical test of a generalist value model's prior, slashing variance and boosting performance.

Yikai Zhang, Yueqing Sun, Hongyan Hao +4

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Search

Qi Gu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (9)