Search papers, labs, and topics across Lattice.
The paper introduces MUTATE, a new interactive benchmark to evaluate divergent thinking in LLM agents by measuring both successful and unsuccessful solution paths and individual action diversity. Experiments reveal that LLMs struggle with action-level divergence when faced with convergence pressure, exhibiting action fixation. To mitigate this, the authors propose ReDNA, a method that decouples divergent candidate generation from convergent constraint selection, achieving superior performance on MUTATE and generalizing to external environments.
LLMs exhibit a surprising "structural blind spot" that causes them to fixate on initial actions and fail to explore diverse solutions when faced with pressure to converge.
Divergent thinking is a core dimension of creativity, yet existing evaluations of Large Language Models (LLMs) treat them as single-turn text generations, failing to capture how an agent reasons through iterative interaction. To address this, we introduce MUTATE, an interactive benchmark designed to evaluate agentic divergent thinking at two levels: path-level, where an agent discovers multiple alternative paths to the same goal, and action-level, where individual actions require non-typical, mechanism-shifting object uses. Unlike success-only evaluations, MUTATE scores both completed paths and off-path attempts, capturing divergent reasoning that conventional success rates discard. Our experiments with frontier LLMs reveal a structural blind spot in existing frameworks: when exposed to immediate convergence pressure, they tend to fall into immediate action fixation, failing to improve action-level divergence. To overcome this, we propose ReDNA, which separates unconstrained divergent candidate generation from convergent constraint selection. ReDNA significantly outperforms prior methods across both divergence levels and generalizes effectively to an external creativity environment. We also confirm its success stems from a qualitative enhancement of resilient divergent reasoning rather than simple environmental exploration.