Mar 10, 2026arXiv:2603.09493

Evolving Prompt Adaptation for Vision-Language Models

Enming Zhang, Jiayang Li, Zhenyu Liu, Yang Li

AI Summary

This paper introduces EvoPrompt, a novel prompt learning framework for adapting VLMs to downstream tasks while mitigating catastrophic forgetting. EvoPrompt uses a Modality-Shared Prompt Projector (MPP) to generate hierarchical prompts and an evolutionary training strategy that decouples low-rank updates into directional and magnitude components. Feature Geometric Regularization (FGR) further stabilizes training by enforcing feature decorrelation. Experiments demonstrate state-of-the-art few-shot performance and improved preservation of zero-shot capabilities.

Key Contribution

Steer clear of catastrophic forgetting in VLMs with EvoPrompt, a new method that evolves prompts by preserving learned semantic directions while adapting their magnitude.

Abstract

The adaptation of large-scale vision-language models (VLMs) to downstream tasks with limited labeled data remains a significant challenge. While parameter-efficient prompt learning methods offer a promising path, they often suffer from catastrophic forgetting of pre-trained knowledge. Toward addressing this limitation, our work is grounded in the insight that governing the evolutionary path of prompts is essential for forgetting-free adaptation. To this end, we propose EvoPrompt, a novel framework designed to explicitly steer the prompt trajectory for stable, knowledge-preserving fine-tuning. Specifically, our approach employs a Modality-Shared Prompt Projector (MPP) to generate hierarchical prompts from a unified embedding space. Critically, an evolutionary training strategy decouples low-rank updates into directional and magnitude components, preserving early-learned semantic directions while only adapting their magnitude, thus enabling prompts to evolve without discarding foundational knowledge. This process is further stabilized by Feature Geometric Regularization (FGR), which enforces feature decorrelation to prevent representation collapse. Extensive experiments demonstrate that EvoPrompt achieves state-of-the-art performance in few-shot learning while robustly preserving the original zero-shot capabilities of pre-trained VLMs.

Computer Vision Multimodal Models Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Evolving Prompt Adaptation for Vision-Language Models

Related Papers