Search papers, labs, and topics across Lattice.
2
0
4
Aligning noise with token embeddings makes vision-language models more resilient to jailbreaking attacks, slashing success rates on the JailBreakV-28K benchmark.
Test-time RL, intended to boost LLM reasoning, can backfire by amplifying harmful tendencies and degrading reasoning when exposed to adversarial prompts.