Mar 5, 2026arXiv:2603.04831

Missingness Bias Calibration in Feature Attribution Explanations

AI Summary

The paper addresses missingness bias in feature attribution explanations, a distortion caused by out-of-distribution inputs during model probing. They introduce MCal, a post-hoc calibration method that fine-tunes a linear head on the frozen base model's outputs to correct for this bias. MCal achieves performance competitive with or exceeding prior, more complex methods across various medical benchmarks, suggesting missingness bias is a superficial artifact of the output space rather than a deep representational flaw.

Key Contribution

Forget retraining or complex architectures: a simple linear head can effectively eliminate missingness bias in feature attribution, rivaling heavyweight methods.

Abstract

Popular explanation methods often produce unreliable feature importance scores due to missingness bias, a systematic distortion that arises when models are probed with ablated, out-of-distribution inputs. Existing solutions treat this as a deep representational flaw that requires expensive retraining or architectural modifications. In this work, we challenge this assumption and show that missingness bias can be effectively treated as a superficial artifact of the model's output space. We introduce MCal, a lightweight post-hoc method that corrects this bias by fine-tuning a simple linear head on the outputs of a frozen base model. Surprisingly, we find this simple correction consistently reduces missingness bias and is competitive with, or even outperforms, prior heavyweight approaches across diverse medical benchmarks spanning vision, language, and tabular domains.

Interpretability & Mechanistic Interp

Citation Metrics

Citations0

Influential citations0

References61

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Missingness Bias Calibration in Feature Attribution Explanations

Related Papers