Feb 19, 2026arXiv:2602.17568

Be Wary of Your Time Series Preprocessing

Sofiane Ennadir, Tianze Wang, Oleg Smirnov, Sahar Asadi, Lele Cao

AI Summary

This paper presents a theoretical analysis of how instance-based (Standard) and global (Min-Max) scaling impact the expressivity of Transformer-based architectures for time series representation learning. They introduce a novel expressivity framework to quantify a model's ability to distinguish between similar and dissimilar time series inputs. Their theoretical bounds and empirical validation on classification and forecasting tasks demonstrate that the choice of normalization significantly influences representational capacity, and surprisingly, omitting normalization can sometimes improve performance.

Key Contribution

Time series Transformers can actually perform *worse* with standard normalization techniques like Min-Max or Standard scaling, challenging common preprocessing practices.

Abstract

Normalization and scaling are fundamental preprocessing steps in time series modeling, yet their role in Transformer-based models remains underexplored from a theoretical perspective. In this work, we present the first formal analysis of how different normalization strategies, specifically instance-based and global scaling, impact the expressivity of Transformer-based architectures for time series representation learning. We propose a novel expressivity framework tailored to time series, which quantifies a model's ability to distinguish between similar and dissimilar inputs in the representation space. Using this framework, we derive theoretical bounds for two widely used normalization methods: Standard and Min-Max scaling. Our analysis reveals that the choice of normalization strategy can significantly influence the model's representational capacity, depending on the task and data characteristics. We complement our theory with empirical validation on classification and forecasting benchmarks using multiple Transformer-based models. Our results show that no single normalization method consistently outperforms others, and in some cases, omitting normalization entirely leads to superior performance. These findings highlight the critical role of preprocessing in time series learning and motivate the need for more principled normalization strategies tailored to specific tasks and datasets.

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Be Wary of Your Time Series Preprocessing

Related Papers