Search papers, labs, and topics across Lattice.
1
0
3
7
Bypassing the headache of aligning mismatched vocabularies, byte-level distillation offers a surprisingly simple and effective approach to transferring knowledge between language models using different tokenizers.