Search papers, labs, and topics across Lattice.
2
0
6
Off-policy reinforcement learning can boost LLM reasoning by 12.5% and solve 40% more problems compared to on-policy methods, simply by re-evaluating and reusing historically difficult samples.
Ditch the min-max: Fuz-RL offers a fuzzy-measure guided approach to safe RL that achieves distributional robustness without complex optimization.