Search papers, labs, and topics across Lattice.
1
0
3
Forget painstaking hyperparameter tuning: this hypersphere parameterization lets you transfer a single learning rate across model sizes, depths, and even MoE architectures, slashing compute costs by 1.58x.