Search papers, labs, and topics across Lattice.
2
0
6
By aligning hidden representations, CRAFT achieves a remarkable 79% improvement in reasoning safety, suggesting that latent-space interventions are a potent defense against jailbreaks.
Google's SynthID-Text, a state-of-the-art LLM watermarking system, can be broken by a layer inflation attack, revealing vulnerabilities in its mean score detection method.