Mar 18, 2026arXiv:2603.17333

Grid Spatial Understanding: A Dataset for Textual Spatial Reasoning over Grids, Embodied Settings, and Coordinate Structures

Risham Sidhu, Risham Sidhu, J. Hockenmaier, Julia Hockenmaier

AI Summary

The paper introduces GSU, a text-based grid dataset designed to evaluate LLMs' spatial reasoning across navigation, object localization, and structure composition tasks. By removing visual inputs, the authors isolate spatial reasoning and demonstrate that while LLMs understand basic grid concepts, they struggle with embodied frames of reference and 3D shape identification from coordinates. Fine-tuning smaller models on GSU shows promise in matching the performance of larger frontier models, suggesting a path towards specialized embodied agents.

Key Contribution

LLMs struggle with spatial reasoning in embodied settings and 3D structure identification even when exposed to visual modalities, but fine-tuning smaller models offers a surprisingly effective alternative to brute-force scaling.

Abstract

We introduce GSU, a text-only grid dataset to evaluate the spatial reasoning capabilities of LLMs over 3 core tasks: navigation, object localization, and structure composition. By forgoing visual inputs, isolating spatial reasoning from perception, we show that while most models grasp basic grid concepts, they struggle with frames of reference relative to an embodied agent and identifying 3D shapes from coordinate lists. We also find that exposure to a visual modality does not provide a generalizable understanding of 3D space that VLMs are able to utilize for these tasks. Finally, we show that while the very latest frontier models can solve the provided tasks (though harder variants may still stump them), fully fine-tuning a small LM or LORA fine-tuning a small LLM show potential to match frontier model performance, suggesting an avenue for specialized embodied agents.

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References39

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Grid Spatial Understanding: A Dataset for Textual Spatial Reasoning over Grids, Embodied Settings, and Coordinate Structures

Related Papers