Search papers, labs, and topics across Lattice.
2
6
3
2
A benchmark for evaluating large language models (LLMs) on multi-step geospatial tasks relevant to commercial GIS practitioners is established, and an LLM-as-Judge evaluation framework is developed to compare agent solutions against reference implementations.
Turns out, Claude 3.5 Sonnet and o4-mini are surprisingly good at geospatial tasks, outperforming even GPT-4.1 and Gemini 2.5 Pro Preview on a new benchmark for tool-calling LLMs.