Search papers, labs, and topics across Lattice.
1
0
3
0
LLM agents are surprisingly inept at Capture The Flag challenges, with even the best models only completing 35% of checkpoints, revealing a significant gap in their ability to perform realistic offensive security tasks.