Search papers, labs, and topics across Lattice.
Yale University
1
0
3
0
Greedy off-policy learning, optimal in theory, can fail spectacularly when supplies are limited, but a simple fix鈥攑rioritizing items with high *relative* reward鈥攃an restore performance.