Search papers, labs, and topics across Lattice.
This paper introduces Noise-Aware Multi-View Prompt alignment (NA-MVP), a novel framework for robust few-shot learning in the presence of noisy labels. NA-MVP uses multi-view prompts and unbalanced optimal transport to achieve fine-grained patch-to-prompt correspondence, suppressing unreliable regions and capturing both clean and noise-aware cues through a bi-directional prompt design. An alignment-guided selective refinement strategy further corrects mislabeled samples, leading to state-of-the-art performance on noisy few-shot learning benchmarks.
Even with noisy labels, NA-MVP achieves robust few-shot learning by adaptively separating clean from noisy signals using bi-directional multi-view prompt alignment.
Vision-language models offer strong few-shot capability through prompt tuning but remain vulnerable to noisy labels, which can corrupt prompts and degrade cross-modal alignment. Existing approaches struggle because they often lack the ability to model fine-grained semantic cues and to adaptively separate clean from noisy signals. To address these challenges, we propose NA-MVP, a framework for Noise-Aware few-shot learning through bi-directional Multi-View Prompt alignment. NA-MVP is built upon a key conceptual shift: robust prompt learning requires moving from global matching to region-aware alignment that explicitly distinguishes clean cues from noisy ones. To realize this, NA-MVP employs (1) multi-view prompts combined with unbalanced optimal transport to achieve fine-grained patch-to-prompt correspondence while suppressing unreliable regions; (2) a bi-directional prompt design that captures complementary clean-oriented and noise-aware cues, enabling the model to focus on stable semantics; and (3) an alignment-guided selective refinement strategy that uses optimal transport to correct only mislabeled samples while retaining reliable data. Experiments on synthetic and real-world noisy benchmarks demonstrate that NA-MVP consistently outperforms state-of-the-art baselines, confirming its effectiveness in enabling robust few-shot learning under noisy supervision.