Sber AI LabMar 16, 2026arXiv:2603.15713

Embedding-Aware Feature Discovery: Bridging Latent Representations and Interpretable Features in Event Sequences

A. Sakhno, I. Sergeev, A. Shestov, O. Zoloev, E. Kovtun, Gleb Gusev, Andrey Savchenko, Maksim Makarenko

AI Summary

This paper introduces Embedding-Aware Feature Discovery (EAFD), a framework that uses a self-reflective LLM agent to generate interpretable features from event sequences, guided by alignment with pretrained embeddings and complementarity to capture missing predictive signals. EAFD iteratively discovers, evaluates, and refines features, effectively bridging the gap between learned embeddings and traditional feature-based pipelines. Experiments on open-source and industrial transaction benchmarks demonstrate that EAFD outperforms both embedding-only and feature-based baselines, achieving relative gains of up to 5.8% over state-of-the-art pretrained embeddings.

Key Contribution

Forget hand-crafted features: this system uses an LLM to automatically discover features from event sequences that outperform even state-of-the-art embeddings by up to 5.8%.

Abstract

Industrial financial systems operate on temporal event sequences such as transactions, user actions, and system logs. While recent research emphasizes representation learning and large language models, production systems continue to rely heavily on handcrafted statistical features due to their interpretability, robustness under limited supervision, and strict latency constraints. This creates a persistent disconnect between learned embeddings and feature-based pipelines. We introduce Embedding-Aware Feature Discovery (EAFD), a unified framework that bridges this gap by coupling pretrained event-sequence embeddings with a self-reflective LLM-driven feature generation agent. EAFD iteratively discovers, evaluates, and refines features directly from raw event sequences using two complementary criteria: \emph{alignment}, which explains information already encoded in embeddings, and \emph{complementarity}, which identifies predictive signals missing from them. Across both open-source and industrial transaction benchmarks, EAFD consistently outperforms embedding-only and feature-based baselines, achieving relative gains of up to $+5.8\%$ over state-of-the-art pretrained embeddings, resulting in new state-of-the-art performance across event-sequence datasets.

Interpretability & Mechanistic Interp Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References41

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Embedding-Aware Feature Discovery: Bridging Latent Representations and Interpretable Features in Event Sequences

Related Papers