Search papers, labs, and topics across Lattice.
2
0
4
0
LLMs can be made far more robust to the position of information in long contexts by simply shuffling the context during fine-tuning.
Forget painstaking hyperparameter tuning: this hypersphere parameterization lets you transfer a single learning rate across model sizes, depths, and even MoE architectures, slashing compute costs by 1.58x.