Search papers, labs, and topics across Lattice.
The paper introduces T1, a CNN-Transformer hybrid architecture for multivariate time series imputation designed to improve robustness under diverse missing patterns and heavy missingness. T1 employs a novel Channel-Head Binding mechanism, creating a one-to-one correspondence between CNN channels and Transformer attention heads to enable selective information transfer across variables. Empirical results on 11 benchmark datasets show that T1 achieves state-of-the-art performance, reducing MSE by 46% on average compared to existing methods, especially under extreme sparsity.
Achieve state-of-the-art multivariate time series imputation by binding CNN channels to Transformer heads, improving robustness to missing data by selectively down-weighting unreliable temporal patterns.
Imputing missing values in multivariate time series remains challenging, especially under diverse missing patterns and heavy missingness. Existing methods suffer from suboptimal performance as corrupted temporal features hinder effective cross-variable information transfer, amplifying reconstruction errors. Robust imputation requires both extracting temporal patterns from sparse observations within each variable and selectively transferring information across variables--yet current approaches excel at one while compromising the other. We introduce T1 (Time series imputation with 1-to-1 channel-head binding), a CNN-Transformer hybrid architecture that achieves robust imputation through Channel-Head Binding--a mechanism creating one-to-one correspondence between CNN channels and attention heads. This design enables selective information transfer: when missingness corrupts certain temporal patterns, their corresponding attention pathways adaptively down-weight based on remaining observable patterns while preserving reliable cross-variable connections through unaffected channels. Experiments on 11 benchmark datasets demonstrate that T1 achieves state-of-the-art performance, reducing MSE by 46% on average compared to the second-best baseline, with particularly strong gains under extreme sparsity (70% missing ratio). The model generalizes to unseen missing patterns without retraining and uses a consistent hyperparameter configuration across all datasets. The code is available at https://github.com/Oppenheimerdinger/T1.