Search papers, labs, and topics across Lattice.
National University of Singapore
3
0
6
Code-generating LLMs may ace static benchmarks, but developers are actually *slower* when using them because they disrupt mental flow, highlighting the need for benchmarks that capture the temporal dynamics of coding.
The trustworthiness of LLM-enabled applications hinges not on further model improvements, but on establishing system-level threat monitoring to detect post-deployment anomalies.
Self-evolving LLM agents can be persistently compromised by injecting malicious payloads into their long-term memory, turning them into "zombie agents" that execute unauthorized actions across sessions.