Search papers, labs, and topics across Lattice.
1
0
4
3
Expert upcycling lets you scale MoEs for 32% less compute by intelligently duplicating and specializing existing experts, challenging the need to train massive MoEs from scratch.