Search papers, labs, and topics across Lattice.
3
0
4
Forget red-teaming, POLARIS automatically turns safety policies into attack strategies, finding more LLM vulnerabilities with verifiable traceability.
The trustworthiness of LLM-enabled applications hinges not on further model improvements, but on establishing system-level threat monitoring to detect post-deployment anomalies.
Self-evolving LLM agents can be persistently compromised by injecting malicious payloads into their long-term memory, turning them into "zombie agents" that execute unauthorized actions across sessions.