Search papers, labs, and topics across Lattice.
3
0
9
20
Stop guessing how much to pretrain vs. specialize your language model – scaling laws can now tell you the optimal compute allocation for maximizing performance on downstream tasks.
Tri-modal masked diffusion models can now be trained from scratch, achieving strong results in text generation, text-to-image, and text-to-speech, thanks to a systematic exploration of the design space and a novel SDE-based batch size reparameterization.
Factually incorrect SLMs can be made more truthful by teaching them *what* to delegate, not just minimizing loss.