Search papers, labs, and topics across Lattice.
2
12
4
5
Classical Chinese, with its conciseness and obscurity, unlocks a surprisingly effective attack vector against LLM safety filters, and can be automatically exploited via bio-inspired optimization.
LLMs can move beyond simple refusals to actively guide vulnerable users towards safe outcomes, achieving state-of-the-art safety and robustness against jailbreaks.