Search papers, labs, and topics across Lattice.
1
0
3
Achieve transformer interpretability by disentangling token and context processing streams, with only a 2.5% performance hit using Kronecker mixing.