Apr 23, 2026arXiv:2604.21930

Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

Nicolae Filat, A. Hussain, K. Kalogiannis, Elena Burceanu

AI Summary

This paper investigates the impact of temporal taskification—the process of splitting a continuous data stream into discrete tasks—on streaming continual learning (CL) benchmarks. They introduce a framework based on plasticity/stability profiles and Boundary-Profile Sensitivity (BPS) to quantify the effect of different taskifications on the induced CL regime *before* training any model. Experiments on network traffic forecasting demonstrate that varying taskification alone significantly alters forecasting error, forgetting, and backward transfer, highlighting the non-neutral role of this preprocessing step in CL evaluation.

Key Contribution

Seemingly innocuous choices about how to split a continuous data stream into discrete tasks can dramatically alter the conclusions of continual learning benchmarks, even before any model is trained.

Abstract

Streaming Continual Learning (CL) typically converts a continuous stream into a sequence of discrete tasks through temporal partitioning. We argue that this temporal taskification step is not a neutral preprocessing choice, but a structural component of evaluation: different valid splits of the same stream can induce different CL regimes and therefore different benchmark conclusions. To study this effect, we introduce a taskification-level framework based on plasticity and stability profiles, a profile distance between taskifications, and Boundary-Profile Sensitivity (BPS), which diagnoses how strongly small boundary perturbations alter the induced regime before any CL model is trained. We evaluate continual finetuning, Experience Replay, Elastic Weight Consolidation, and Learning without Forgetting on network traffic forecasting with CESNET-Timeseries24, keeping the stream, model, and training budget fixed while varying only the temporal taskification. Across 9-, 30-, and 44-day splits, we observe substantial changes in forecasting error, forgetting, and backward transfer, showing that taskification alone can materially affect CL evaluation. We further find that shorter taskifications induce noisier distribution-level patterns, larger structural distances, and higher BPS, indicating greater sensitivity to boundary perturbations. These results show that benchmark conclusions in streaming CL depend not only on the learner and the data stream, but also on how that stream is taskified, motivating temporal taskification as a first-class evaluation variable.

Eval Frameworks & Benchmarks

Citation Metrics

Citations0

Influential citations0

References38

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

Related Papers