Search papers, labs, and topics across Lattice.
2
0
4
3
Forget static layer selection – GRASS dynamically adapts which layers to fine-tune based on gradient norms, unlocking significant memory savings and accuracy gains.
Squeeze 34% more decode speed out of your MoE model without sacrificing accuracy by intelligently budgeting expert activations.