2025

GeoBenchX: Benchmarking LLMs for Multistep Geospatial Tasks

AI Summary

A benchmark for evaluating large language models (LLMs) on multi-step geospatial tasks relevant to commercial GIS practitioners is established, and an LLM-as-Judge evaluation framework is developed to compare agent solutions against reference implementations.

Citation Metrics

Citations3

Influential citations0

References14

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

GeoBenchX: Benchmarking LLMs for Multistep Geospatial Tasks

Related Papers