Search papers, labs, and topics across Lattice.
This paper introduces the concept of "semantic override" in LLMs, where models fail to adhere to locally redefined semantics of operators and gates in reasoning tasks, instead reverting to their pretrained interpretations. The authors create a micro-benchmark of 30 logic and digital-circuit reasoning tasks to systematically evaluate this failure mode, along with a related error called "assumption injection." Experiments on three frontier LLMs demonstrate persistent noncompliance with local specifications, highlighting a gap in specification-faithful reasoning.
LLMs often fail to follow explicitly redefined instructions, instead relying on their pre-trained knowledge, even in simple logic and circuit reasoning tasks.
Large language models (LLMs) demonstrate strong performance on standard digital logic and Boolean reasoning tasks, yet their reliability under locally redefined semantics remains poorly understood. In many formal settings, such as circuit specifications, examinations, and hardware documentation, operators and components are explicitly redefined within narrow scope. Correct reasoning in these contexts requires models to temporarily suppress globally learned conventions in favor of prompt-local definitions. In this work, we study a systematic failure mode we term semantic override, in which an LLM reverts to its pretrained default interpretation of operators or gate behavior despite explicit redefinition in the prompt. We also identify a related class of errors, assumption injection, where models commit to unstated hardware semantics when critical details are underspecified, rather than requesting clarification. We introduce a compact micro-benchmark of 30 logic and digital-circuit reasoning tasks designed as verifier-style traps, spanning Boolean algebra, operator overloading, redefined gates, and circuit-level semantics. Evaluating three frontier LLMs, we observe persistent noncompliance with local specifications, confident but incompatible assumptions, and dropped constraints even in elementary settings. Our findings highlight a gap between surface-level correctness and specification-faithful reasoning, motivating evaluation protocols that explicitly test local unlearning and semantic compliance in formal domains.