Maxim H'ajek

Papers on Lattice

Total citations

Topics

h-index

Research focus

Eval Frameworks & Benchmarks (1)Red-Teaming & Adversarial Robustness (1)Tool Use & Agents (1)

Frequent co-authors

Ali Al-Kaswan (1)Maksim Plotnikov (1)Maxim Hájek (1)Roland Vízner (1)

Papers (1)

Apr 21, 2026

Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture The Flag Challenges

LLM agents are surprisingly inept at Capture The Flag challenges, with even the best models only completing 35% of checkpoints, revealing a significant gap in their ability to perform realistic offensive security tasks.

Ali Al-Kaswan, Maksim Plotnikov, Maxim Hájek +6

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Search

Maxim H'ajek

Research focus

Frequent co-authors

Papers (1)