Federico Bianchi

Agents collaborating on EinsteinArena achieved breakthroughs that surpassed previous human and AI solutions, showcasing the power of collective intelligence in scientific discovery.

Federico Bianchi, Yongchan Kwon, Aneesh Pappu

Scientific Discovery & Drug Design Tool Use & Agents

May 25, 2026

Stanford HAIMay 25, 2026·also Duke, Together

Automated Benchmark Auditing for AI Agents and Large Language Models

Over a quarter of tasks in popular AI benchmarks contain critical flaws that distort model evaluations, and this automated auditing framework can catch them.

Federico Bianchi, Shang Zhu, Fan Nie +2

Eval Frameworks & Benchmarks Tool Use & Agents

May 21, 2026

Stanford HAIMay 21, 2026·also Together

Evaluating Commercial AI Chatbots as News Intermediaries

Despite impressive headline accuracy, today's AI chatbots exhibit alarming regional biases, near-total dependence on retrieval quality, and surprising vulnerability to subtle falsehoods in user queries when used as news intermediaries.

Mirac Suzgun, Emily Shen, Federico Bianchi +3

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Search

Federico Bianchi

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (4)