Apr 16, 2026arXiv:2604.14958

Frequency-Enhanced Dual-Subspace Networks for Few-Shot Fine-Grained Image Classification

Meijia Wang, Guochao Wang, Haozhen Chu, Bin Yao, Weichuan Zhang, Yuan Wang, Junpo Yang

AI Summary

This paper introduces Frequency-Enhanced Dual-Subspace Network (FEDSNet) for few-shot fine-grained image classification, addressing the limitations of spatial domain-only methods that are prone to texture biases and overfitting. FEDSNet leverages Discrete Cosine Transform (DCT) to isolate low-frequency structural components and employs Truncated Singular Value Decomposition (SVD) to construct dual subspaces for spatial texture and frequency structure, fused via an adaptive gating mechanism. Experiments on four benchmark datasets demonstrate FEDSNet's superior classification performance and robustness compared to existing metric learning algorithms.

Key Contribution

By explicitly disentangling and fusing spatial textures with frequency-based structural features, FEDSNet achieves state-of-the-art few-shot fine-grained classification, demonstrating that frequency information is crucial for overcoming texture biases and improving robustness.

Abstract

Few-shot fine-grained image classification aims to recognize subcategories with high visual similarity using only a limited number of annotated samples. Existing metric learning-based methods typically rely solely on spatial domain features. Confined to this single perspective, models inevitably suffer from inherent texture biases, entangling essential structural details with high-frequency background noise. Furthermore, lacking cross-view geometric constraints, single-view metrics tend to overfit this noise, resulting in structural instability under few-shot conditions. To address these issues, this paper proposes the Frequency-Enhanced Dual-Subspace Network (FEDSNet). Specifically, FEDSNet utilizes the Discrete Cosine Transform (DCT) and a low-pass filtering mechanism to explicitly isolate low-frequency global structural components from spatial features, thereby suppressing background interference. Truncated Singular Value Decomposition (SVD) is employed to construct independent, low-rank linear subspaces for both spatial texture and frequency structural features. An adaptive gating mechanism is designed to dynamically fuse the projection distances from these dual views. This strategy leverages the structural stability of the frequency subspace to prevent the spatial subspace from overfitting to background features. Extensive experiments on four benchmark datasets - CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC-Aircraft - demonstrate that FEDSNet exhibits excellent classification performance and robustness, achieving highly competitive results compared to existing metric learning algorithms. Complexity analysis further confirms that the proposed network achieves a favorable balance between high accuracy and computational efficiency, providing an effective new paradigm for few-shot fine-grained visual recognition.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References70

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Frequency-Enhanced Dual-Subspace Networks for Few-Shot Fine-Grained Image Classification

Related Papers