ByteDanceFeb 24, 2026arXiv:2602.20973

Linear Reasoning vs. Proof by Cases: Obstacles for Large Language Models in FOL Problem Solving

Yuliang Ji, Fuchen Shen, Fuchen Shen, Jian Wu, Qiujie Xie

AI Summary

The paper introduces PC-FOL, a new first-order logic dataset designed to evaluate case-based reasoning in LLMs, addressing the limitations of existing datasets that primarily focus on linear reasoning. Experiments on leading LLMs reveal a significant performance gap between linear and case-based reasoning tasks within PC-FOL. A theoretical analysis using graphical models is provided to explain the observed performance disparity.

Key Contribution

LLMs struggle significantly with case-based reasoning in first-order logic, performing far worse than on linear reasoning tasks, exposing a key limitation in their mathematical capabilities.

Abstract

To comprehensively evaluate the mathematical reasoning capabilities of Large Language Models (LLMs), researchers have introduced abundant mathematical reasoning datasets. However, most existing datasets primarily focus on linear reasoning, neglecting other parts such as proof by contradiction and proof by cases, which are crucial for investigating LLMs'reasoning abilities. To address this limitation, we first introduce a novel first-order logic (FOL) dataset named PC-FOL, annotated by professional mathematicians, focusing on case-based reasoning problems. All instances in this dataset are equipped with a manually written natural language proof, clearly distinguishing it from conventional linear reasoning datasets. Our experimental results over leading LLMs demonstrate a substantial performance gap between linear reasoning and case-based reasoning problems. To further investigate this phenomenon, we provide a theoretical analysis grounded in graphical model, which provides an explanation for the observed disparity between the two types of reasoning problems. We hope this work can reveal the core challenges in the field of automated natural language mathematical proof generation, paving the way for future research.

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References30

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Linear Reasoning vs. Proof by Cases: Obstacles for Large Language Models in FOL Problem Solving

Related Papers