BAIRFeb 26, 2026arXiv:2602.22583

Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance

Weida Liang, Yiyou Sun, Yiyou Sun, Shuyuan Nan, Shuyuan Nan, Chuang Li, Chuan Li, Dawn Song, D. Song, Kenji Kawaguchi

AI Summary

The paper investigates the instability of example-based guidance in mathematical reasoning, attributing it to a dissociation between strategy usage and executability, where human- and model-derived strategies exhibit complementary strengths. They find that strategies effective for humans may not be executable by models, and vice versa, leading to performance reversals under guidance. To address this, they propose Selective Strategy Retrieval (SSR), a framework that models executability by selectively retrieving and combining strategies based on empirical, multi-route, source-aware signals.

Key Contribution

Human-written solutions can actually *hurt* model performance on math problems, highlighting a critical gap between strategy usage and executability that Selective Strategy Retrieval (SSR) effectively bridges.

Abstract

Example-based guidance is widely used to improve mathematical reasoning at inference time, yet its effectiveness is highly unstable across problems and models-even when the guidance is correct and problem-relevant. We show that this instability arises from a previously underexplored gap between strategy usage-whether a reasoning strategy appears in successful solutions-and strategy executability-whether the strategy remains effective when instantiated as guidance for a target model. Through a controlled analysis of paired human-written and model-generated solutions, we identify a systematic dissociation between usage and executability: human- and model-derived strategies differ in structured, domain-dependent ways, leading to complementary strengths and consistent source-dependent reversals under guidance. Building on this diagnosis, we propose Selective Strategy Retrieval (SSR), a test-time framework that explicitly models executability by selectively retrieving and combining strategies using empirical, multi-route, source-aware signals. Across multiple mathematical reasoning benchmarks, SSR yields reliable and consistent improvements over direct solving, in-context learning, and single-source guidance, improving accuracy by up to $+13$ points on AIME25 and $+5$ points on Apex for compact reasoning models. Code and benchmark are publicly available at: https://github.com/lwd17/strategy-execute-pipeline.

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References48

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance

Related Papers