Mar 16, 2026arXiv:2603.15553

Self-Distillation of Hidden Layers for Self-Supervised Representation Learning

Scott C. Lowe, Anthony Fuller, Sageev Oore, Evan Shelhamer, Graham W. Taylor

AI Summary

This paper introduces Bootleg, a self-supervised learning method that predicts latent representations from multiple hidden layers of a teacher network, bridging the gap between generative and predictive SSL approaches. By using a hierarchical objective, Bootleg captures features at varying levels of abstraction simultaneously, avoiding the instability of final-layer self-distillation in purely predictive methods. Bootleg achieves significant performance gains over baselines, including a +10% improvement over I-JEPA on ImageNet-1K classification.

Key Contribution

Ditch unstable final-layer self-distillation: Bootleg predicts latent representations from multiple hidden layers, boosting ImageNet classification by 10% over I-JEPA.

Abstract

The landscape of self-supervised learning (SSL) is currently dominated by generative approaches (e.g., MAE) that reconstruct raw low-level data, and predictive approaches (e.g., I-JEPA) that predict high-level abstract embeddings. While generative methods provide strong grounding, they are computationally inefficient for high-redundancy modalities like imagery, and their training objective does not prioritize learning high-level, conceptual features. Conversely, predictive methods often suffer from training instability due to their reliance on the non-stationary targets of final-layer self-distillation. We introduce Bootleg, a method that bridges this divide by tasking the model with predicting latent representations from multiple hidden layers of a teacher network. This hierarchical objective forces the model to capture features at varying levels of abstraction simultaneously. We demonstrate that Bootleg significantly outperforms comparable baselines (+10% over I-JEPA) on classification of ImageNet-1K and iNaturalist-21, and semantic segmentation of ADE20K and Cityscapes.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Self-Distillation of Hidden Layers for Self-Supervised Representation Learning

Related Papers