Search papers, labs, and topics across Lattice.
This paper introduces a data-driven signal separation technique using a modified SoundStream tokenizer and an end-to-end transformer trained with cross-entropy loss. The tokenizer uses finite-scalar quantization (FSQ) instead of VQVAE and incorporates additional transformer layers. Applied to the MIT RF Challenge dataset, the method achieves a 122x reduction in bit-error rate (BER) compared to prior state-of-the-art techniques for separating a QPSK signal from 5G interference, demonstrating zero-shot generalization to unseen mixtures.
Beat the state-of-the-art in radio signal separation by 122x using a transformer trained on cross-entropy loss, and the same architecture could work for gravitational waves.
We study a problem of signal separation: estimating a signal of interest (SOI) contaminated by an unknown non-Gaussian background/interference. Given the training data consisting of examples of SOI and interference, we show how to build a fully data-driven signal separator. To that end we learn a good discrete tokenizer for SOI and then train an end-to-end transformer on a cross-entropy loss. Training with a cross-entropy shows substantial improvements over the conventional mean-squared error (MSE). Our tokenizer is a modification of Google's SoundStream, which incorporates additional transformer layers and switches from VQVAE to finite-scalar quantization (FSQ). Across real and synthetic mixtures from the MIT RF Challenge dataset, our method achieves competitive performance, including a 122x reduction in bit-error rate (BER) over prior state-of-the-art techniques for separating a QPSK signal from 5G interference. The learned representation adapts to the interference type without side information and shows zero-shot generalization to unseen mixtures at inference time, underscoring its potential beyond RF. Although we instantiate our approach on radio-frequency mixtures, we expect the same architecture to apply to gravitational-wave data (e.g., LIGO strain) and other scientific sensing problems that require data-driven modeling of background and noise.