Search papers, labs, and topics across Lattice.
2
0
4
Unlock hidden performance in your pre-trained language models with "inner looping," a simple inference-time trick that repeatedly refines latent representations by re-applying selected transformer blocks.
Transformers suffer from a subtle but significant misalignment: residual connections inadvertently tie information to the *wrong* token, but a simple residual attenuation fix can boost performance.