Search papers, labs, and topics across Lattice.
1
0
3
Pretrained ALiBi transformers suffer from a widespread attention collapse that can be surgically repaired to yield a 25% perplexity improvement, suggesting that standard pretraining leaves performance on the table.