Search papers, labs, and topics across Lattice.
2
0
3
Unlock $\sqrt{N}$ regret in offline policy learning, even with complex policy classes, by trading off policy and environment complexity.
Unlock asymptotically normal and semiparametrically efficient estimators in adaptive data collection by using a novel target-specific condition called "directional stability," which is weaker than previous target-agnostic conditions.