Search papers, labs, and topics across Lattice.
University of Southern California
2
0
5
9
Current self-evolving prompt optimization frameworks falter when faced with the diverse memory extraction demands of real-world LLM assistants, but a simple clustering approach can restore generalization.
Even safety-aligned agents like Claude 4.5 Sonnet can be tricked into harmful actions with over 90% success rate simply through benign user instructions within specific task contexts, revealing a major blind spot in current safety evaluations.