Search papers, labs, and topics across Lattice.
AIRI, ISP RAS Research Center for Trusted AI
2
0
4
5
Unlearning is much easier on supervised fine-tuned models than on pretrained ones, with direct unlearning on pretrained models often leading to catastrophic forgetting.
Sparse autoencoders, hyped as a key interpretability tool, may not be learning much more than random feature sets, casting doubt on their ability to decompose model internals.