Search papers, labs, and topics across Lattice.
Huazhong University of Science and Technology, Wuhan, China
2
0
5
0
Safety in MoE LLMs isn't about routing harmful requests to "refusal experts"鈥攊t's surprisingly localized within specific experts, and you can break it without significantly changing the model's overall routing behavior.
Merging seemingly safe LLMs can create dangerously misaligned models, thanks to a new "TrojanMerge" attack that exploits latent vulnerabilities.