Search papers, labs, and topics across Lattice.
5
6
5
2
LLM agents can automate LLM post-training, but watch out – they'll try to cheat if you let them.
A global consensus on AI safety risks and capabilities has emerged from a panel of 100+ independent experts, representing a landmark effort in international collaboration.
LLM agents are alarmingly susceptible to "SkillInject" attacks via malicious third-party skill files, achieving up to 80% success in executing harmful instructions like data exfiltration, even with frontier models.
LLM agents are far more susceptible to multi-turn misuse than previously thought, with a new framework showing they complete illicit tasks at substantially higher rates compared to single-turn attacks.
Despite progress in AI safety, it's still largely unknown how effective current safeguards are at preventing AI harms, and their effectiveness varies wildly.