Search papers, labs, and topics across Lattice.
3
0
6
1
LLM watermarks can now survive fine-tuning, quantization, and distillation thanks to a new method that embeds them in a stable functional subspace.
By enforcing graph isomorphism across counterfactual inputs, UGID reveals that debiasing LLMs can be achieved by directly manipulating internal representations and attention mechanisms.
Backdoor attacks can now hide in plain sight: by delaying activation, common words become viable triggers, opening a new, stealthier attack surface in pre-trained models.