Search papers, labs, and topics across Lattice.
1
3
LLMs struggle to detect subtle autonomy-undermining manipulation tactics, with performance varying by up to 25% across harm categories, according to a new benchmark.