Search papers, labs, and topics across Lattice.
The paper introduces EEG-FM-Audit, a systematic pipeline for evaluating EEG Foundation Models (FMs) that addresses limitations in current studies, such as opaque baselines and unverified contributions of learning paradigms. Using ASHA-driven benchmarking, paradigm-level ablations, and neurophysiological probing (NPP), the pipeline was applied to four EEG-FMs and five supervised models across three datasets. Results showed that properly tuned supervised baselines can match or outperform FMs with fewer parameters, and the effectiveness of FM learning paradigms depends on dataset scale and architecture, while NPP analysis reveals how FMs leverage specific physiological features.
Seemingly impressive EEG Foundation Models may be all hype: carefully tuned supervised baselines often perform just as well, but with far fewer parameters.
Large EEG Foundation Models (FMs) have shown great potential for decoding EEG signals across diverse cognitive tasks. However, existing EEG-FM studies exhibit three critical limitations: opaque supervised baseline tuning, unverified contributions of complex learning paradigms, and a lack of transparency in model decision-making. To address these, we propose EEG-FM-Audit, a comprehensive evaluation and analysis pipeline designed to systematize the assessment of EEG-FMs. EEG-FM-Audit consists of three primary components: (1) an ASHA-driven benchmarking protocol that ensures fair comparisons by transparently optimizing supervised baselines; (2) paradigm-level ablation studies to evaluate the effectiveness of learning paradigms in FMs; and (3) a neurophysiological probing (NPP) framework, which explores whether FMs leverage valid temporal, spatial, and spectral EEG properties. We apply EEG-FM-Audit to four state-of-the-art EEG-FMs and five representative supervised models across three public datasets. Our results reveal that properly tuned supervised baselines can match or outperform advanced FMs, despite requiring significantly fewer parameters. Furthermore, we find that the effectiveness of learning paradigms of FMs is highly dependent on dataset scale and architecture. Finally, NPP analysis demonstrates how FMs rely on specific physiological features, establishing a framework for more interpretable neural decoding.