Search papers, labs, and topics across Lattice.
University of Ox- ford
2
0
5
Coordinating AI agents gets a reliability boost: BOT-Orch uses bandit learning and Optimal Transport to intelligently delegate tasks, even when agents are unpredictable.
Single-rollout RL can rival multi-rollout performance for LLM reasoning, thanks to a new batchwise advantage estimation technique that dramatically improves value function accuracy.