Search papers, labs, and topics across Lattice.
2
0
5
2
Thompson Sampling can be just as efficient with pairwise preference feedback as it is with scalar rewards, opening up new avenues for optimization in human-in-the-loop and experimental design scenarios.
Bypassing the headache of aligning mismatched vocabularies, byte-level distillation offers a surprisingly simple and effective approach to transferring knowledge between language models using different tokenizers.