Search papers, labs, and topics across Lattice.
This paper introduces a systematic evaluation framework for mechanistic interpretability in single-cell foundation models, comprising 37 analyses and 153 statistical tests across multiple cell types and perturbation modalities. Applying this framework to scGPT and Geneformer, the authors find that while attention patterns encode structured biological information like protein-protein interactions and transcriptional regulation, they do not improve perturbation prediction compared to simple gene-level baselines. They also introduce Cell-State Stratified Interpretability (CSSI) to address attention scaling failures, improving GRN recovery.
Attention mechanisms in single-cell foundation models capture co-expression patterns rather than unique regulatory signals, challenging assumptions about their mechanistic interpretability.
We present a systematic evaluation framework - thirty-seven analyses, 153 statistical tests, four cell types, two perturbation modalities - for assessing mechanistic interpretability in single-cell foundation models. Applying this framework to scGPT and Geneformer, we find that attention patterns encode structured biological information with layer-specific organisation - protein-protein interactions in early layers, transcriptional regulation in late layers - but this structure provides no incremental value for perturbation prediction: trivial gene-level baselines outperform both attention and correlation edges (AUROC 0.81-0.88 versus 0.70), pairwise edge scores add zero predictive contribution, and causal ablation of regulatory heads produces no degradation. These findings generalise from K562 to RPE1 cells; the attention-correlation relationship is context-dependent, but gene-level dominance is universal. Cell-State Stratified Interpretability (CSSI) addresses an attention-specific scaling failure, improving GRN recovery up to 1.85x. The framework establishes reusable quality-control standards for the field.