May 25, 2026arXiv:2605.25941

Where Concept Erasure Should Occur: Concept-Layer Alignment in Text-to-Video Diffusion Models

AI Summary

The paper identifies that text-to-video diffusion models encode semantic information unevenly across layers, creating a "concept-layer topological alignment" where concepts are more separable at specific depths. They argue that concept erasure is most effective at these depths due to reduced entanglement with non-target signals. To address this, they introduce CLEAR, an optimization framework that selects layers for concept erasure based on concept-non-target separability, leading to improved concept suppression and generative quality.

Key Contribution

Concept erasure in text-to-video diffusion models is far more effective when targeted at specific layers where concepts are naturally more separable from other information.

Abstract

Text-to-video diffusion transformers encode semantic information unevenly across model depth, which constrains effective concept erasure. We identify a representational bottleneck, termed concept-layer topological alignment, under which target concepts exhibit higher separability at certain representational depths. Outside these depths, concept and non-target signals remain strongly entangled, limiting the effectiveness of depth-specific erasure. This observation reframes concept erasure as the problem of identifying representational depths where concept-non-target separation naturally emerges. Motivated by this structural constraint, we introduce CLEAR, a separability-driven optimization framework for concept erasure that explicitly enforces concept-layer alignment. CLEAR operationalizes this principle by formulating layer selection as an optimization problem over concept-non-target separability, rather than relying on layer-agnostic or heuristic choices. To enable this, we introduce a separability-aware objective that favors layers exhibiting stronger concept-non-target separation. Experiments on large-scale text-to-video diffusion models demonstrate that enforcing concept--layer alignment leads to more precise concept suppression while preserving overall generative quality.

Computer Vision Interpretability & Mechanistic Interp Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Where Concept Erasure Should Occur: Concept-Layer Alignment in Text-to-Video Diffusion Models

Related Papers