CMU MLMar 30, 2026arXiv:2603.29025

The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

Yubo Li, Lu Zhang, Tianchong Jiang, Ramayya Krishnan, R. Padman

AI Summary

The paper identifies a systematic failure in LLMs' reasoning abilities when surface-level heuristics conflict with implicit feasibility constraints, using a "diagnose-measure-bridge-treat" framework. Through analysis of the "car wash problem" and the Heuristic Override Benchmark (HOB), the authors demonstrate that LLMs are overly influenced by salient cues like distance, often ignoring underlying constraints. Interventions like minimal hints and goal-decomposition prompting improve performance, suggesting the issue stems from constraint inference rather than knowledge gaps.

Key Contribution

LLMs are surprisingly bad at common-sense reasoning, often choosing the obviously wrong answer when a simple heuristic conflicts with an unstated constraint.

Abstract

Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose-measure-bridge-treat framework. Causal-behavioral analysis of the ``car wash problem''across six models reveals approximately context-independent sigmoid heuristics: the distance cue exerts 8.7 to 38 times more influence than the goal, and token-level attribution shows patterns more consistent with keyword associations than compositional inference. The Heuristic Override Benchmark (HOB) -- 500 instances spanning 4 heuristic by 5 constraint families with minimal pairs and explicitness gradients -- demonstrates generality across 14 models: under strict evaluation (10/10 correct), no model exceeds 75%, and presence constraints are hardest (44%). A minimal hint (e.g., emphasizing the key object) recovers +15 pp on average, suggesting the failure lies in constraint inference rather than missing knowledge; 12/14 models perform worse when the constraint is removed (up to -39 pp), revealing conservative bias. Parametric probes confirm that the sigmoid pattern generalizes to cost, efficiency, and semantic-similarity heuristics; goal-decomposition prompting recovers +6 to 9 pp by forcing models to enumerate preconditions before answering. Together, these results characterize heuristic override as a systematic reasoning vulnerability and provide a benchmark for measuring progress toward resolving it.

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References48

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

Related Papers