Search papers, labs, and topics across Lattice.
This study introduces Hypnos, a multi-modal foundation model that employs next-token prediction to learn compact representations of sleep physiology from diverse physiological signals, including EEG and ECG. By training on over 20,000 polysomnography recordings, Hypnos achieves state-of-the-art performance in sleep stage classification using significantly less labeled data compared to traditional supervised methods. Notably, Hypnos also generalizes to daytime physiology tasks, outperforming dedicated models in detecting atrial fibrillation, highlighting the versatility and effectiveness of next-token prediction in this domain.
Next-token prediction not only excels in sleep stage classification but also generalizes to daytime physiology, outperforming specialized models in critical health tasks.
Foundation models offer a promising route to compress multi-modal physiological signals into compact representations of human health, with broad applications across sleep medicine, cardiology, neurology and other healthcare domains. Existing models have typically been trained with masked-reconstruction or contrastive objectives. However, masked reconstruction may be poorly suited to the stochastic nature of these signals, while contrastive approaches rely on positive-pair definitions despite the semantic invariances of physiological signals being poorly understood. In this work, we show that next-token prediction is a simple and scalable alternative. We develop Hypnos, a multi-modal sleep foundation model trained using eight different sensing modalities (e.g. EEG, ECG, respiratory signals) drawn from over 20,000 overnight polysomnography recordings. We tokenize each modality into streams of discrete tokens using residual vector quantization, then train a large auto-regressive RQ-Transformer to jointly predict the next token across all modalities in parallel. After training, Hypnos can be applied to continuous streams of sensor data from any subset of supported modalities, generating embeddings for downstream tasks. Across a range of benchmarks, Hypnos significantly outperforms existing foundation models. In sleep stage classification, we match the performance of strong supervised baselines on held-out test sets whilst using \(100\times\) less labelled data. Hypnos even generalises to daytime physiology, surpassing a dedicated ECG foundation model at detecting atrial fibrillation. Our results demonstrate that next-token prediction is a strong self-supervised objective for representation learning from multi-modal physiological signals.