Feb 18, 2026arXiv:2602.16681

VETime: Vision Enhanced Zero-Shot Time Series Anomaly Detection

Yingyuan Yang, Tian Lan, Tian Lan, Yifei Gao, Yimeng Lu, Wenjun He, Wenjun He, Chenghao Liu, Chenghao Liu, Chen Zhang

AI Summary

The paper introduces VETime, a novel zero-shot time series anomaly detection (TSAD) framework that unifies temporal and visual modalities to address the limitations of existing 1D and 2D foundation models. VETime uses Reversible Image Conversion and Patch-Level Temporal Alignment to create a shared visual-temporal timeline, preserving fine-grained details and temporal sensitivity. By incorporating Anomaly Window Contrastive Learning and Task-Adaptive Multi-Modal Fusion, VETime achieves state-of-the-art zero-shot TSAD performance with improved localization precision and reduced computational cost.

Key Contribution

By unifying temporal and visual modalities with fine-grained alignment, VETime leapfrogs existing approaches to achieve state-of-the-art zero-shot time series anomaly detection.

Abstract

Time-series anomaly detection (TSAD) requires identifying both immediate Point Anomalies and long-range Context Anomalies. However, existing foundation models face a fundamental trade-off: 1D temporal models provide fine-grained pointwise localization but lack a global contextual perspective, while 2D vision-based models capture global patterns but suffer from information bottlenecks due to a lack of temporal alignment and coarse-grained pointwise detection. To resolve this dilemma, we propose VETime, the first TSAD framework that unifies temporal and visual modalities through fine-grained visual-temporal alignment and dynamic fusion. VETime introduces a Reversible Image Conversion and a Patch-Level Temporal Alignment module to establish a shared visual-temporal timeline, preserving discriminative details while maintaining temporal sensitivity. Furthermore, we design an Anomaly Window Contrastive Learning mechanism and a Task-Adaptive Multi-Modal Fusion to adaptively integrate the complementary perceptual strengths of both modalities. Extensive experiments demonstrate that VETime significantly outperforms state-of-the-art models in zero-shot scenarios, achieving superior localization precision with lower computational overhead than current vision-based approaches. Code available at: https://github.com/yyyangcoder/VETime.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References33

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

VETime: Vision Enhanced Zero-Shot Time Series Anomaly Detection

Related Papers