Search papers, labs, and topics across Lattice.
The University of Tokyo
3
0
5
Centering advantages in policy gradients can drastically reduce variance and improve performance in reinforcement learning tasks.
OrderGrad transforms policy-gradient optimization by enabling precise control over distributional properties, allowing for risk-averse and exploratory learning in real-world applications.
LLMs that ace medical exams still fumble basic clinical judgment, prematurely deciding cases or abstaining unnecessarily when information is incomplete, revealing a critical gap in their real-world applicability.