Search papers, labs, and topics across Lattice.
1
0
4
3
Forget scaling laws: pre-training LLMs on just 164M tokens of synthetic, non-linguistic data can outperform pre-training on 1.6B tokens of Common Crawl, opening a new path to efficient model training.