Search papers, labs, and topics across Lattice.
1
0
2
3
Current optimism-based RLHF exploration can lead to linear regret, but a new uncertainty-focused exploration strategy achieves polynomial regret scaling in all model parameters.