Apr 7, 2026arXiv:2604.05629

A Unified Foundation Model for All-in-One Multi-Modal Remote Sensing Image Restoration and Fusion with Language Prompting

AI Summary

This paper introduces LLaRS, a unified foundation model for multi-modal remote sensing image restoration and fusion, trained on a new million-scale dataset (LLaRS1M) with language prompts. LLaRS leverages Sinkhorn-Knopp optimal transport for band alignment and a mixture-of-experts architecture to handle diverse degradation types. Results demonstrate that LLaRS outperforms existing task-specific models and exhibits strong transfer learning capabilities with parameter-efficient finetuning.

Key Contribution

A single foundation model can now handle a wide range of remote sensing restoration and fusion tasks, outperforming specialized models and paving the way for more generalizable remote sensing AI.

Abstract

Remote sensing imagery suffers from clouds, haze, noise, resolution limits, and sensor heterogeneity. Existing restoration and fusion approaches train separate models per degradation type. In this work, we present Language-conditioned Large-scale Remote Sensing restoration model (LLaRS), the first unified foundation model for multi-modal and multi-task remote sensing low-level vision. LLaRS employs Sinkhorn-Knopp optimal transport to align heterogeneous bands into semantically matched slots, routes features through three complementary mixture-of-experts layers (convolutional experts for spatial patterns, channel-mixing experts for spectral fidelity, and attention experts with low-rank adapters for global context), and stabilizes joint training via step-level dynamic weight adjustment. To train LLaRS, we construct LLaRS1M, a million-scale multi-task dataset spanning eleven restoration and enhancement tasks, integrating real paired observations and controlled synthetic degradations with diverse natural language prompts. Experiments show LLaRS consistently outperforms seven competitive models, and parameter-efficient finetuning experiments demonstrate strong transfer capability and adaptation efficiency on unseen data. Repo: https://github.com/yc-cui/LLaRS

Computer Vision Multimodal Models Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A Unified Foundation Model for All-in-One Multi-Modal Remote Sensing Image Restoration and Fusion with Language Prompting

Related Papers