HunanJilinUniversity of Electronic Science and TechnologyJun 8, 2026arXiv:2606.10046

Inside the Latent Flow: Causal Deciphering of Attention Dynamics in Audio Separation Foundation Models

AI Summary

This paper investigates the attention dynamics of flow-matching transformers in audio separation by employing a causal-intervention probing protocol tailored for SAM Audio. The authors reveal a dual-pathway mechanism where additive injections manage semantic identity and cross-attention enhances acoustic structure, alongside an asynchronous convergence of layerwise processing. The proposed Layer-Selective Attention Caching (LSAC) method significantly reduces self-attention computation by approximately 25% while preserving audio quality, demonstrating up to 6.7 times better quality retention compared to naive approaches.

Key Contribution

Layer-Selective Attention Caching achieves a 25% reduction in computation while enhancing audio quality retention by up to 6.7 times, revolutionizing efficiency in audio separation models.

Abstract

Flow-matching transformers achieve strong audio separation, yet their attention dynamics are opaque. We adapt established causal-intervention principles into a deterministic, inference-time probing protocol for SAM Audio. Orthogonal probing uncovers a dual-pathway text-conditioning mechanism: additive injections control semantic identity, while cross-attention refines acoustic structure. We observe an asynchronous layerwise convergence: stable layers build temporal scaffolds early, whereas fast layers continue resolving artifacts during sampling. The model also attenuates temporal segmentation cues to maintain continuous-flow stability. Using these insights, we propose Layer-Selective Attention Caching (LSAC), a training-free acceleration method that caches attention in stable layers. Across acoustic complexities, LSAC cuts self-attention computation by about ~25% with negligible quality loss and yields up to 6.7x higher quality retention than naive step reduction.

Interpretability & Mechanistic Interp Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Inside the Latent Flow: Causal Deciphering of Attention Dynamics in Audio Separation Foundation Models

Related Papers