Search papers, labs, and topics across Lattice.
This paper introduces an Information Bottleneck (IB) regularizer for Concept Bottleneck Models (CBMs) that minimizes the mutual information between the input and the concept layer, $I(X;C)$, while preserving task-relevant information $I(C;Y)$. They derive a variational objective and an entropy-based surrogate for practical implementation, integrating them into CBM training without architectural changes or extra supervision. Experiments across six CBM families and three benchmarks demonstrate that IB-regularized models outperform vanilla CBMs, improving both predictive performance and concept-level intervention reliability.
Concept Bottleneck Models get a faithfulness boost: an information bottleneck regularizer minimizes concept leakage and improves predictive performance without architectural changes or extra supervision.
Concept Bottleneck Models (CBMs) aim to deliver interpretable predictions by routing decisions through a human-understandable concept layer, yet they often suffer reduced accuracy and concept leakage that undermines faithfulness. We introduce an explicit Information Bottleneck regularizer on the concept layer that penalizes $I(X;C)$ while preserving task-relevant information in $I(C;Y)$, encouraging minimal-sufficient concept representations. We derive two practical variants (a variational objective and an entropy-based surrogate) and integrate them into standard CBM training without architectural changes or additional supervision. Evaluated across six CBM families and three benchmarks, the IB-regularized models consistently outperform their vanilla counterparts. Information-plane analyses further corroborate the intended behavior. These results indicate that enforcing a minimal-sufficient concept bottleneck improves both predictive performance and the reliability of concept-level interventions. The proposed regularizer offers a theoretic-grounded, architecture-agnostic path to more faithful and intervenable CBMs, resolving prior evaluation inconsistencies by aligning training protocols and demonstrating robust gains across model families and datasets.