Search papers, labs, and topics across Lattice.
1
0
2
A reward-free observer can achieve the same asymptotic efficiency as a fully reward-aware learner in Inverse Contextual Bandits, simply by ignoring the learner's initial exploratory actions.