Search papers, labs, and topics across Lattice.
1
0
3
Multilingual MoEs can achieve best-in-class performance-to-compute ratios, even with extreme sparsity, by strategically upcycling from dense models and exhibiting structured expert activation patterns across languages.