Search papers, labs, and topics across Lattice.
NTT Human Informatics Laboratories
1
0
3
7
Forget scaling laws: surgically debiasing reward models by intervening on just 2% of neurons lets smaller models punch *way* above their weight in alignment.