Search papers, labs, and topics across Lattice.
2
0
4
0
Stop wasting your finetuning data: Specialized Pretraining (SPT) can outperform standard pretraining and finetuning, achieving better domain performance with fewer parameters and less compute.
Multilingual interference isn't a fundamental capacity limit, but a data problem: targeted curation of just 8% of your training data can yield 4-10x FLOPs savings.