Aug 14, 2025arXiv:2508.10635

ChatENV: An Interactive Vision-Language Model for Sensor-Guided Environmental Monitoring and Scenario Simulation

Hosam Elgendy, Ahmed Sharshar, Ahmed Aboeitta, Mohsen Guizani

AI Summary

The paper introduces ChatENV, a novel interactive vision-language model designed for environmental monitoring by reasoning over satellite image pairs and real-world sensor data. They construct a 177k-image dataset of temporal pairs with sensor metadata and diverse captions generated using GPT4o and Gemini 2.0. Fine-tuning Qwen-2.5-VL with LoRA, ChatENV demonstrates strong performance in temporal and scenario-based reasoning, outperforming existing temporal models.

Key Contribution

Forget static captions, ChatENV lets you chat with satellite images and sensor data to simulate environmental scenarios.

Abstract

Understanding environmental changes from remote sensing imagery is vital for climate resilience, urban planning, and ecosystem monitoring. Yet, current vision language models (VLMs) overlook causal signals from environmental sensors, rely on single-source captions prone to stylistic bias, and lack interactive scenario-based reasoning. We present ChatENV, the first interactive VLM that jointly reasons over satellite image pairs and real-world sensor data. Our framework: (i) creates a 177k-image dataset forming 152k temporal pairs across 62 land-use classes in 197 countries with rich sensor metadata (e.g., temperature, PM10, CO); (ii) annotates data using GPT4o and Gemini 2.0 for stylistic and semantic diversity; and (iii) fine-tunes Qwen-2.5-VL using efficient Low-Rank Adaptation (LoRA) adapters for chat purposes. ChatENV achieves strong performance in temporal and"what-if"reasoning (e.g., BERTF1 0.902) and rivals or outperforms state-of-the-art temporal models, while supporting interactive scenario-based analysis. This positions ChatENV as a powerful tool for grounded, sensor-aware environmental monitoring.

Computer Vision Multimodal Models World Models & Planning

Citation Metrics

Citations1

Influential citations0

References40

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

ChatENV: An Interactive Vision-Language Model for Sensor-Guided Environmental Monitoring and Scenario Simulation

Related Papers