Search papers, labs, and topics across Lattice.
The paper investigates a largely unexplored type of causal relationship in DNNs: encoded absences, where the *absence* of a concept increases neural activation. They demonstrate that mainstream XAI methods struggle to reveal these absences in their standard form. To address this, they propose extensions to attribution and feature visualization techniques to uncover encoded absences, showing improved debiasing and explanations in ImageNet models.
DNN neurons often fire *more* strongly when a concept is missing, revealing a blind spot in standard XAI methods that can now be addressed.
Explainable artificial intelligence (XAI) aims to provide human-interpretable insights into the behavior of deep neural networks (DNNs), typically by estimating a simplified causal structure of the model. In existing work, this causal structure often includes relationships where the presence of a concept is associated with a strong activation of a neuron. For example, attribution methods primarily identify input pixels that contribute most to a prediction, and feature visualization methods reveal inputs that cause high activation of a target neuron - the former implicitly assuming that the relevant information resides in the input, and the latter that neurons encode the presence of concepts. However, a largely overlooked type of causal relationship is that of encoded absences, where the absence of a concept increases neural activation. In this work, we show that such missing but relevant concepts are common and that mainstream XAI methods struggle to reveal them when applied in their standard form. To address this, we propose two simple extensions to attribution and feature visualization techniques that uncover encoded absences. Across experiments, we show how mainstream XAI methods can be used to reveal and explain encoded absences, how ImageNet models exploit them, and that debiasing can be improved when considering them.