Search papers, labs, and topics across Lattice.
1
0
0
0
Even with perfect knowledge of the optimal Q-function, learning in offline RL with partial coverage can be surprisingly hard, but a new framework based on decision-estimation coefficients (DEC) offers a path to tighter bounds and modular algorithm design.