May 6, 2026arXiv:2605.04425

Joint Semantic Token Selection and Prompt Optimization for Interpretable Prompt Learning

Yating Wang, Yaqi Zhao, Yongshun Gong, Yilong Yin, Haoliang Sun

AI Summary

The paper introduces Interpretable Prompt Learning (IPL), a hybrid framework for vision-language models that combines discrete semantic token selection with continuous prompt optimization to improve interpretability and accuracy. IPL formulates token selection as a submodular optimization problem to encourage human-understandable and semantically diverse tokens. By alternating between discrete token selection and continuous prompt tuning, IPL achieves improved interpretability without sacrificing downstream task performance.

Key Contribution

Make your prompts 5x more interpretable without hurting accuracy: IPL combines discrete token selection with continuous optimization, and it's plug-and-play with existing methods.

Abstract

Vision-language models such as CLIP achieve strong visual-textual alignment, but often suffer from overfitting and limited interpretability when adapted through continuous prompt learning. While discrete prompt optimization improves interpretability, it usually depends on large external models, leading to high computational costs and limited scalability. In this paper, we propose Interpretable Prompt Learning (IPL), a hybrid framework that alternates between discrete semantic token selection and continuous prompt optimization. Specifically, IPL formulates semantic token selection as an approximate submodular optimization problem, encouraging tokens that are both human-understandable and semantically diverse. It further adopts an alternating optimization strategy to integrate discrete token selection with continuous prompt tuning, improving interpretability while preserving adaptability to downstream tasks. Our framework is plug-and-play, allowing seamless integration with existing prompt learning methods. Extensive experiments on multiple benchmarks show that IPL consistently improves both interpretability and accuracy across five representative prompt learning methods, providing an effective and scalable extension to existing frameworks.

Interpretability & Mechanistic Interp Multimodal Models Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References54

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Joint Semantic Token Selection and Prompt Optimization for Interpretable Prompt Learning

Related Papers