Search papers, labs, and topics across Lattice.
4
30
5
15
Entropy regularization makes planning provably easy: SmoothCruiser achieves polynomial sample complexity in MDPs where standard methods fail.
Log-barrier regularization unlocks optimal O-tilde(t^{-1/4}) last-iterate convergence in uncoupled matrix games with bandit feedback, finally closing the gap to the theoretical limit.
Ditch reward models: Nash Mirror Prox achieves fast, stable convergence to a Nash equilibrium directly from human preferences, sidestepping the limitations of traditional RLHF.
A single algorithm now solves both rested and restless rotting bandits, problems previously thought to require fundamentally different approaches.