Apr 30, 2026arXiv:2604.28031

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

AI Summary

DriftBench, a new benchmark, evaluates constraint adherence in multi-turn LLM-assisted scientific ideation by measuring how well models preserve fidelity to original objectives across multiple turns of interaction. The study reveals that iterative pressure increases structural complexity while decreasing adherence to original constraints, and that models often violate constraints they can accurately recall (knows-but-violates or KBV). Structured checkpointing partially reduces KBV rates, but the dissociation between recall and adherence persists, highlighting a significant challenge in using LLMs for iterative idea refinement.

Key Contribution

LLMs can accurately recall constraints while simultaneously violating them, with "knows-but-violates" rates ranging from 8% to 99%, revealing a fundamental flaw in multi-turn ideation.

Abstract

When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherence in multi-turn LLM-assisted scientific ideation. Across 2,146 scored benchmark runs spanning seven models from five providers (including two open-weight), four interaction conditions, and 38 research briefs from 24 scientific domains, we find that iterative pressure reliably increases structural complexity and often reduces adherence to original constraints. A restatement probe reveals a dissociation between declarative recall and behavioral adherence, as models accurately restate constraints they simultaneously violate. The knows-but-violates (KBV) rate, measuring constraint non-compliance despite preserved recall, ranges from 8% to 99% across models. Structured checkpointing partially reduces KBV rates but does not close the dissociation, and complexity inflation persists. Human validation against blind raters confirms that the LLM judge under-detects constraint violations, making reported constraint adherence scores conservative. Sensitivity analyses confirm the findings are robust to temperature (0.7 vs.\ 1.0) and pressure type (novelty vs.\ rigor). We release all briefs, prompts, rubrics, transcripts, and scores as an open benchmark.

Eval Frameworks & Benchmarks Natural Language Processing Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

Related Papers