Search papers, labs, and topics across Lattice.
This paper introduces Frequency-Enhanced Dual-Subspace Network (FEDSNet) for few-shot fine-grained image classification, addressing the limitations of spatial domain-only methods that are prone to texture biases and overfitting. FEDSNet leverages Discrete Cosine Transform (DCT) to isolate low-frequency structural components and employs Truncated Singular Value Decomposition (SVD) to construct dual subspaces for spatial texture and frequency structure, fused via an adaptive gating mechanism. Experiments on four benchmark datasets demonstrate FEDSNet's superior classification performance and robustness compared to existing metric learning algorithms.
By explicitly disentangling and fusing spatial textures with frequency-based structural features, FEDSNet achieves state-of-the-art few-shot fine-grained classification, demonstrating that frequency information is crucial for overcoming texture biases and improving robustness.
Few-shot fine-grained image classification aims to recognize subcategories with high visual similarity using only a limited number of annotated samples. Existing metric learning-based methods typically rely solely on spatial domain features. Confined to this single perspective, models inevitably suffer from inherent texture biases, entangling essential structural details with high-frequency background noise. Furthermore, lacking cross-view geometric constraints, single-view metrics tend to overfit this noise, resulting in structural instability under few-shot conditions. To address these issues, this paper proposes the Frequency-Enhanced Dual-Subspace Network (FEDSNet). Specifically, FEDSNet utilizes the Discrete Cosine Transform (DCT) and a low-pass filtering mechanism to explicitly isolate low-frequency global structural components from spatial features, thereby suppressing background interference. Truncated Singular Value Decomposition (SVD) is employed to construct independent, low-rank linear subspaces for both spatial texture and frequency structural features. An adaptive gating mechanism is designed to dynamically fuse the projection distances from these dual views. This strategy leverages the structural stability of the frequency subspace to prevent the spatial subspace from overfitting to background features. Extensive experiments on four benchmark datasets - CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC-Aircraft - demonstrate that FEDSNet exhibits excellent classification performance and robustness, achieving highly competitive results compared to existing metric learning algorithms. Complexity analysis further confirms that the proposed network achieves a favorable balance between high accuracy and computational efficiency, providing an effective new paradigm for few-shot fine-grained visual recognition.