CASHZNUNTUXiamen UniversityApr 16, 2026arXiv:2604.15016

DLink: Distilling Layer-wise and Dominant Knowledge from EEG Foundation Models

Jingyuan Wang, Meiyan Xu, Zhihao Jia, Chenyu Liu, Xinliang Zhou, Ziyu Jia, Yong Li, Fang Li, Junfeng Yao, Yi Ding

AI Summary

The paper introduces DLink, a knowledge distillation framework tailored for EEG foundation models (FMs), addressing the challenge of deploying these computationally intensive models on embedded BCI systems. DLink uses a dynamic router to aggregate relevant teacher layers, an EEG MiC student architecture for structured compression, and spectral distillation to align teacher-student representations in the frequency domain. Results on four EEG benchmarks demonstrate that DLink enables compact student models to achieve near FM performance with significantly reduced model size and inference cost.

Key Contribution

Compact student models can now achieve near EEG foundation model performance with significantly reduced model size and inference cost, thanks to a novel knowledge distillation framework.

Abstract

EEG foundation models (FMs) achieve strong cross-subject and cross-task generalization but impose substantial computational and memory costs that hinder deployment on embedded BCI systems. Knowledge distillation is a natural solution; however, conventional methods fail for EEG FMs because task-relevant semantics are often distributed across intermediate layers, and aggressive dimensionality reduction can distort oscillatory structure via representational collapse and aliasing. To address these challenges, we propose DLink (Distilling Layer-wise and Dominant Knowledge), a unified framework for transferring knowledge from large EEG FMs to compact students with three key innovations: (1) a dynamic Router that adaptively aggregates teacher layers to capture dominant intermediate representations; (2) an EEG MiC student with a Mimic-then-Compress pipeline, which inherits high-dimensional teacher features and then applies structured spatio-temporal compression to avoid a heavy classification head; and (3) spectral distillation that aligns teacher-student representations in the frequency domain to regularize compression and mitigate aliasing and temporal jitter. Experiments on four EEG benchmarks show that DLink enables compact students to outperform lightweight baselines while approaching fully fine-tuned FM performance at substantially lower model size and inference cost.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References35

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DLink: Distilling Layer-wise and Dominant Knowledge from EEG Foundation Models

Related Papers