Search papers, labs, and topics across Lattice.
1
0
2
Forget entropy regularization—this work shows how directly inferring a distribution over optimal policies in MDPs, using a modified VSMC, yields a stochastic control policy via Thompson sampling.