Ricardo Silveira Cabral

Research focus

Eval Frameworks & Benchmarks (2)Tool Use & Agents (2)World Models & Planning (1)RLHF & Preference Learning (1)

Frequent co-authors

A. Budhiraja (2)Vladislav Vorotilov (2)Romain Froger (1)Pierre Andrews (1)

Papers (2)

Feb 12, 2026

Romain Froger +23Feb 12, 2026

Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments

GPT-5 can ace most agent benchmarks, but put it in a dynamic, real-world environment and it chokes on time-sensitive tasks, exposing a critical "sim2real" gap.

Romain Froger, Pierre Andrews, Matteo Bettini +21

Eval Frameworks & Benchmarks Tool Use & Agents World Models & Planning

Feb 20, 2025

Apple MLFeb 20, 2025·also UCSB

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

LLMs can now play at being AI researchers, but they're mostly just good at hyperparameter sweeps, not groundbreaking discoveries.

Deepak Nathani, Lovish Madaan, Nicholas Roberts +1448

Eval Frameworks & Benchmarks RLHF & Preference Learning Tool Use & Agents

Search

Ricardo Silveira Cabral

Research focus

Frequent co-authors

Papers (2)