Search papers, labs, and topics across Lattice.
1
0
1
3
Even in a seemingly simple tabular environment like Blackjack, model-free RL agents can converge to near-optimal *average* rewards while still making surprisingly poor decisions in specific states.