Search papers, labs, and topics across Lattice.
1
0
3
Fine-tuning LLMs doesn't have to trash their safety: adaptively regularizing updates based on predicted harmful intent keeps models aligned without sacrificing utility.