Key Laboratory of Intelligent Processing and Application Technology of Satellite InformationSpace Engineering UniversityDec 25, 2025

SAREval: A Multi-Dimensional and Multi-Task Benchmark for Evaluating Visual Language Models on SAR Image Understanding

Ziyan Wang, Lei Liu, Gang Wan, Yuchen Lu, Fengjie Zheng, Guangde Sun, Yixiang Huang, Shihao Guo, Xinyi Li, Liang Yuan

AI Summary

The paper introduces SAREval, a novel benchmark designed to evaluate VLMs on SAR image understanding, addressing the challenges posed by SAR's unique imaging mechanisms. SAREval comprises 20 tasks, ranging from image classification to physical attribute inference, using over 10,000 high-quality image-text pairs annotated with SAR-specific characteristics. Experiments on 11 mainstream VLMs reveal limitations in SAR image interpretation, particularly in fine-grained classification and physical parameter mapping, highlighting the need for specialized VLM development for SAR data.

Key Contribution

VLMs stumble badly when interpreting SAR imagery, achieving only 25% accuracy on fine-grained tasks, according to SAREval, the first comprehensive benchmark designed to expose these limitations.

Abstract

Vision-Language Models (VLMs) demonstrate significant potential for remote sensing interpretation through multimodal fusion and semantic representation of imagery. However, their adaptation to Synthetic Aperture Radar (SAR) remains challenging due to fundamental differences in imaging mechanisms and physical properties compared to optical remote sensing. SAREval, the first comprehensive benchmark specifically designed for SAR image understanding, incorporates SAR-specific characteristics, including scattering mechanisms and polarization features, through a hierarchical framework spanning perception, reasoning, and robustness capabilities. It encompasses 20 tasks from image classification to physical-attribute inference with over 10,000 high-quality image–text pairs. Extensive experiments conducted on 11 mainstream VLMs reveal substantial limitations in SAR image interpretation. Models achieve merely 25.35% accuracy in fine-grained ship classification tasks and demonstrate significant difficulties in establishing mappings between visual features and physical parameters. Furthermore, certain models exhibit unexpected performance improvements under certain noise conditions that challenge conventional robustness understanding. SAREval establishes an essential foundation for developing and evaluating VLMs in SAR image interpretation, providing standardized assessment protocols and quality-controlled annotations for cross-modal remote sensing research.

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2025

VenueRemote Sensing

Related Papers

Finding related papers...

Search

SAREval: A Multi-Dimensional and Multi-Task Benchmark for Evaluating Visual Language Models on SAR Image Understanding

Related Papers