BITS PilaniApr 7, 2026arXiv:2604.05681

LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo

AI Summary

LudoBench is introduced as a new benchmark for evaluating strategic reasoning in LLMs, using the game of Ludo to create complex, stochastic scenarios. The benchmark consists of 480 handcrafted board states designed to isolate 12 distinct strategic decision categories. Evaluation of six LLMs reveals that they only agree with a game-theory baseline 40-46% of the time, exhibiting distinct behavioral archetypes and sensitivity to prompt framing.

Key Contribution

LLMs struggle to master even simple board games like Ludo, agreeing with optimal game-theory strategies less than half the time and exhibiting inconsistent behavior based on prompt framing.

Abstract

We introduce LudoBench, a benchmark for evaluating LLM strategic reasoning in Ludo, a stochastic multi-agent board game whose dice mechanics, piece capture, safe-square navigation, and home-path progression introduce meaningful planning complexity. LudoBench comprises 480 handcrafted spot scenarios across 12 behaviorally distinct decision categories, each isolating a specific strategic choice. We additionally contribute a fully functional 4-player Ludo simulator supporting Random, Heuristic, Game-Theory, and LLM agents. The game-theory agent uses Expectiminimax search with depth-limited lookahead to provide a principled strategic ceiling beyond greedy heuristics. Evaluating six models spanning four model families, we find that all models agree with the game-theory baseline only 40-46% of the time. Models split into distinct behavioral archetypes: finishers that complete pieces but neglect development, and builders that develop but never finish. Each archetype captures only half of the game theory strategy. Models also display measurable behavioral shifts under history-conditioned grudge framing on identical board states, revealing prompt-sensitivity as a key vulnerability. LudoBench provides a lightweight and interpretable framework for benchmarking LLM strategic reasoning under uncertainty. All code, the spot dataset (480 entries) and model outputs are available at https://anonymous.4open.science/r/LudoBench-5CBF/

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo

Related Papers