Search papers, labs, and topics across Lattice.
1
0
3
Smaller initialization scales in early layers unlock superior fine-tuning generalization by enabling both feature reuse and refinement, challenging the intuition that larger initializations are always better.