Mar 12, 2026arXiv:2603.11479

Grammar of the Wave: Towards Explainable Multivariate Time Series Event Detection via Neuro-Symbolic VLM Agents

Sky Chenwei Wan, T. Hou, Yifei Wang, Xiqing Chang, Aymeric Jan

AI Summary

The paper introduces Knowledge-Guided Time Series Event Detection (TSED), a new setting where models detect events in multivariate time series based on natural language descriptions, even with limited training data. To address this, they propose Event Logic Tree (ELT), a knowledge representation framework that bridges linguistic descriptions and time series data by modeling the temporal-logic structures of events. They then present a neuro-symbolic VLM agent framework that instantiates primitives from signal visualizations and composes them under ELT constraints, generating detected intervals and explanations.

Key Contribution

Forget black-box anomaly detection: this neuro-symbolic VLM agent uses natural language descriptions and visual grounding to explain *why* an event occurred in multivariate time series data, even with little training.

Abstract

Time Series Event Detection (TSED) has long been an important task with critical applications across many high-stakes domains. Unlike statistical anomalies, events are defined by semantics with complex internal structures, which are difficult to learn inductively from scarce labeled data in real-world settings. In light of this, we introduce Knowledge-Guided TSED, a new setting where a model is given a natural-language event description and must ground it to intervals in multivariate signals with little or no training data. To tackle this challenge, we introduce Event Logic Tree (ELT), a novel knowledge representation framework to bridge linguistic descriptions and physical time series data via modeling the intrinsic temporal-logic structures of events. Based on ELT, we present a neuro-symbolic VLM agent framework that iteratively instantiates primitives from signal visualizations and composes them under ELT constraints, producing both detected intervals and faithful explanations in the form of instantiated trees. To validate the effectiveness of our approach, we release a benchmark based on real-world time series data with expert knowledge and annotations. Experiments and human evaluation demonstrate the superiority of our method compared to supervised fine-tuning baselines and existing zero-shot time series reasoning frameworks based on LLMs/VLMs. We also show that ELT is critical in mitigating VLMs'inherent hallucination in matching signal morphology with event semantics.

Interpretability & Mechanistic Interp Multimodal Models Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References25

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Grammar of the Wave: Towards Explainable Multivariate Time Series Event Detection via Neuro-Symbolic VLM Agents

Related Papers