Search papers, labs, and topics across Lattice.
LMU Munich, University of Copenhagen, Munich Center for Machine Learning
2
0
4
Output vector editing can suppress up to 87.9% of memorized sequences in large language models, significantly outperforming traditional neuron-level methods.
LLM-labeled data can match human-labeled data in aggregate performance for hostility detection, but be warned: the errors are systematically different, especially in nuanced cases.