Search papers, labs, and topics across Lattice.
The paper introduces SAREval, a novel benchmark designed to evaluate VLMs on SAR image understanding, addressing the challenges posed by SAR's unique imaging mechanisms. SAREval comprises 20 tasks, ranging from image classification to physical attribute inference, using over 10,000 high-quality image-text pairs annotated with SAR-specific characteristics. Experiments on 11 mainstream VLMs reveal limitations in SAR image interpretation, particularly in fine-grained classification and physical parameter mapping, highlighting the need for specialized VLM development for SAR data.
VLMs stumble badly when interpreting SAR imagery, achieving only 25% accuracy on fine-grained tasks, according to SAREval, the first comprehensive benchmark designed to expose these limitations.
Vision-Language Models (VLMs) demonstrate significant potential for remote sensing interpretation through multimodal fusion and semantic representation of imagery. However, their adaptation to Synthetic Aperture Radar (SAR) remains challenging due to fundamental differences in imaging mechanisms and physical properties compared to optical remote sensing. SAREval, the first comprehensive benchmark specifically designed for SAR image understanding, incorporates SAR-specific characteristics, including scattering mechanisms and polarization features, through a hierarchical framework spanning perception, reasoning, and robustness capabilities. It encompasses 20 tasks from image classification to physical-attribute inference with over 10,000 high-quality image–text pairs. Extensive experiments conducted on 11 mainstream VLMs reveal substantial limitations in SAR image interpretation. Models achieve merely 25.35% accuracy in fine-grained ship classification tasks and demonstrate significant difficulties in establishing mappings between visual features and physical parameters. Furthermore, certain models exhibit unexpected performance improvements under certain noise conditions that challenge conventional robustness understanding. SAREval establishes an essential foundation for developing and evaluating VLMs in SAR image interpretation, providing standardized assessment protocols and quality-controlled annotations for cross-modal remote sensing research.