Search papers, labs, and topics across Lattice.
KRAFTON
3
2
6
3
Diversity-aware scoring transforms MoE models into dense architectures, boosting downstream accuracy by over 6% while speeding up training.
By dynamically balancing fast adaptation and stable averaging, AMUSE delivers faster convergence and better final performance than AdamW and Muon, all without any learning rate tuning.
Forget RLHF's quirks: aligning LLMs is fundamentally a distribution learning problem, and preference distillation offers a theoretically sound and empirically strong alternative.