Search papers, labs, and topics across Lattice.
1
0
2
1
Thompson Sampling can be just as efficient with pairwise preference feedback as it is with scalar rewards, opening up new avenues for optimization in human-in-the-loop and experimental design scenarios.