May 7, 2025arXiv:2505.04110

Alpha Excel Benchmark

AI Summary

This paper introduces the Alpha Excel Benchmark, a novel evaluation suite for LLMs based on 113 challenges from the Financial Modeling World Cup (FMWC) converted into programmatically evaluable JSON formats. The benchmark assesses LLM performance on realistic, business-oriented tasks, addressing a gap between abstract academic benchmarks and practical applications. Results show significant performance variations across LLMs, with strengths in pattern recognition but weaknesses in complex numerical reasoning.

Key Contribution

Forget abstract reasoning benchmarks – this new Excel-based challenge reveals how LLMs actually perform on the kinds of financial modeling tasks used by 1.5 billion people daily.

Abstract

This study presents a novel benchmark for evaluating Large Language Models (LLMs) using challenges derived from the Financial Modeling World Cup (FMWC) Excel competitions. We introduce a methodology for converting 113 existing FMWC challenges into programmatically evaluable JSON formats and use this dataset to compare the performance of several leading LLMs. Our findings demonstrate significant variations in performance across different challenge categories, with models showing specific strengths in pattern recognition tasks but struggling with complex numerical reasoning. The benchmark provides a standardized framework for assessing LLM capabilities in realistic business-oriented tasks rather than abstract academic problems. This research contributes to the growing field of AI benchmarking by establishing proficiency among the 1.5 billion people who daily use Microsoft Excel as a meaningful evaluation metric that bridges the gap between academic AI benchmarks and practical business applications.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Citation Metrics

Citations1

Influential citations0

References17

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

Alpha Excel Benchmark

Related Papers