KTHMay 4, 2026arXiv:2605.02804

Multi-Axis Speech Similarity via Factor-Partitioned Embeddings

AI Summary

This paper introduces a factor-partitioned embedding framework for speech that disentangles multiple attributes like linguistic content, speaker identity, and dialect into distinct subspaces of a single embedding vector. Each subspace is trained via distillation from specialist teachers or contrastive learning. The resulting embeddings enable attribute-conditioned retrieval through signed weighted sums of per-axis cosine similarities. Experiments on cross-corpus retrieval demonstrate the framework's ability to suppress speaker bias and retrieve semantically matched utterances across diverse recording conditions.

Key Contribution

Stop letting speaker identity drown out semantic similarity: this new embedding method lets you independently control the influence of different speech attributes when comparing utterances.

Abstract

Speech encodes multiple simultaneous attributes--linguistic content, speaker identity, dialect, gender--that conventional single-vector embeddings conflate. We present a factor-partitioned embedding framework that maps each utterance into a single vector whose subspaces correspond to distinct axes of variation. A shared acoustic encoder feeds per-axis linear projection heads, each trained via distillation from a specialist teacher or a contrastive objective over shared-label pairs. The resulting embeddings support attribute-conditioned retrieval: similarity is computed as a signed weighted sum over per-axis cosine scores, allowing retrieval that jointly considers what was said and how --or explicitly suppresses one attribute to surface another. We evaluate on cross-corpus retrieval over corpora sharing the Harvard sentence prompts, demonstrating that signed axis weighting can suppress same-speaker bias and surface semantically matched utterances across recording conditions.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multi-Axis Speech Similarity via Factor-Partitioned Embeddings

Related Papers