Bingxiang He

AI coding agents excel at translating scientific tasks into familiar formats but struggle to achieve true scientific discovery, with only 17.8% surpassing state-of-the-art benchmarks.

Yuru Wang, Lejun Cheng, Yuxin Zuo +14

Eval Frameworks & Benchmarks Scientific Discovery & Drug Design

Jun 1, 2026

Amazon ScienceJun 1, 2026·also Emory, Penn State

Adaptive Auto-Harness: Sustained Self-Improvement for Agentic System Deployment on Open-Ended Task Streams

Sustained self-improvement in LLM agents is achievable through a novel adaptive framework that outperforms traditional methods in dynamic task environments.

Zewen Liu, Zhan Shi, Yisi Sang +7

Eval Frameworks & Benchmarks Tool Use & Agents

Mar 9, 2026

Tsinghua AIMar 9, 2026

How Far Can Unsupervised RLVR Scale LLM Training?

Intrinsic reward signals in unsupervised RL for LLMs inevitably collapse due to sharpening of the model's prior, but external rewards grounded in computational asymmetries offer a path to sustained scaling.

Bingxiang He, Yuxin Zuo, Zeyuan Liu +23

RLHF & Preference Learning Scalable Oversight & Alignment Theory Training Efficiency & Optimization

Search

Bingxiang He

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)