Search papers, labs, and topics across Lattice.
The paper introduces VANGUARD, a Geometric Perception Skill designed to recover Ground Sample Distance (GSD) for UAVs operating in GPS-denied environments by leveraging vehicle detection as environmental anchors. VANGUARD estimates the modal pixel length of detected vehicles using kernel density estimation and converts it to GSD using a pre-calibrated reference length, providing a GSD estimate and a confidence score. Experiments on the DOTA~v1.5 benchmark and a custom area measurement benchmark demonstrate VANGUARD's accuracy and robustness, showing it significantly reduces spatial scale hallucinations compared to state-of-the-art VLMs when integrated with SAM-based segmentation.
VLMs hallucinate spatial scales, but this new tool lets UAVs in GPS-denied environments accurately estimate Ground Sample Distance using vehicle detections, slashing area estimation errors by 2.6x and catastrophic failures by 4x.
Autonomous aerial robots operating in GPS-denied or communication-degraded environments frequently lose access to camera metadata and telemetry, leaving onboard perception systems unable to recover the absolute metric scale of the scene. As LLM/VLM-based planners are increasingly adopted as high-level agents for embodied systems, their ability to reason about physical dimensions becomes safety-critical -- yet our experiments show that five state-of-the-art VLMs suffer from spatial scale hallucinations, with median area estimation errors exceeding 50%. We propose VANGUARD, a lightweight, deterministic Geometric Perception Skill designed as a callable tool that any LLM-based agent can invoke to recover Ground Sample Distance (GSD) from ubiquitous environmental anchors: small vehicles detected via oriented bounding boxes, whose modal pixel length is robustly estimated through kernel density estimation and converted to GSD using a pre-calibrated reference length. The tool returns both a GSD estimate and a composite confidence score, enabling the calling agent to autonomously decide whether to trust the measurement or fall back to alternative strategies. On the DOTA~v1.5 benchmark, VANGUARD achieves 6.87% median GSD error on 306~images. Integrated with SAM-based segmentation for downstream area measurement, the pipeline yields 19.7% median error on a 100-entry benchmark -- with 2.6x lower category dependence and 4x fewer catastrophic failures than the best VLM baseline -- demonstrating that equipping agents with deterministic geometric tools is essential for safe autonomous spatial reasoning.