Search papers, labs, and topics across Lattice.
This paper introduces three new metrics to disentangle model adaptation from inherent data difficulty when evaluating temporal distribution shift: Adaptation Error, Intrinsic Error, and Adaptation Time. These metrics provide a more granular understanding of model performance by quantifying how quickly and effectively models adapt to changes in the data distribution over time. Experiments demonstrate that these metrics reveal adaptation patterns obscured by traditional average performance measures, leading to a more nuanced assessment of temporal robustness.
Temporal performance drops? It's not always the model's fault: these new metrics disentangle adaptation from *inherent* data difficulty, revealing hidden patterns in how models handle evolving data.
Evaluating robustness under temporal distribution shift remains an open challenge. Existing metrics quantify the average decline in performance, but fail to capture how models adapt to evolving data. As a result, temporal degradation is often misinterpreted: when accuracy declines, it is unclear whether the model is failing to adapt or whether the data itself has become inherently more challenging to learn. In this work, we propose three complementary metrics to distinguish adaptation from intrinsic difficulty in the data. Together, these metrics provide a dynamic and interpretable view of model behavior under temporal distribution shift. Results show that our metrics uncover adaptation patterns hidden by existing analysis, offering a richer understanding of temporal robustness in evolving environments.