Search papers, labs, and topics across Lattice.
1
0
1
3
Forget reward function dependencies – this new approach to contextual bandits with latent state dynamics achieves stronger regret bounds by directly modeling hidden state dependencies and adaptively estimating HMM parameters.