Search papers, labs, and topics across Lattice.
Market-Bench, a new benchmark, evaluates LLMs as retailer agents in a simulated supply chain, assessing their ability to procure inventory via auctions and set retail prices/marketing. The benchmark uses economic, operational, and semantic metrics to track bids, prices, sales, and agent balance sheets. Results show a "winner-take-most" dynamic, where only a few LLMs consistently profit, despite similar semantic matching scores, highlighting performance disparities in economically-relevant tasks.
LLMs with similar semantic skills show wildly different economic performance in simulated markets, revealing that reasoning about competition and resource allocation remains a major challenge.
The ability of large language models (LLMs) to manage and acquire economic resources remains unclear. In this paper, we introduce \textbf{Market-Bench}, a comprehensive benchmark that evaluates the capabilities of LLMs in economically-relevant tasks through economic and trade competition. Specifically, we construct a configurable multi-agent supply chain economic model where LLMs act as retailer agents responsible for procuring and retailing merchandise. In the \textbf{procurement} stage, LLMs bid for limited inventory in budget-constrained auctions. In the \textbf{retail} stage, LLMs set retail prices, generate marketing slogans, and provide them to buyers through a role-based attention mechanism for purchase. Market-Bench logs complete trajectories of bids, prices, slogans, sales, and balance-sheet states, enabling automatic evaluation with economic, operational, and semantic metrics. Benchmarking on 20 open- and closed-source LLM agents reveals significant performance disparities and winner-take-most phenomenon, \textit{i.e.}, only a small subset of LLM retailers can consistently achieve capital appreciation, while many hover around the break-even point despite similar semantic matching scores. Market-Bench provides a reproducible testbed for studying how LLMs interact in competitive markets.