Search papers, labs, and topics across Lattice.
The paper introduces causal circuit tracing, a method involving ablation of sparse autoencoder (SAE) features and measuring downstream responses, to understand feature-to-feature interactions in biological foundation models like Geneformer V2-316M and scGPT. Applying this method reveals that both models exhibit inhibitory dominance and biological coherence, irrespective of architecture and cell type. Cross-model comparisons identify conserved domain pairs, enriched for disease-associated domains, although CRISPRi validation suggests that these relationships reflect co-expression rather than direct causal encoding.
Biological foundation models like Geneformer and scGPT, despite architectural differences, converge on surprisingly similar computational motifs characterized by inhibitory dominance and biological coherence.
Motivation: Sparse autoencoders (SAEs) decompose foundation model activations into interpretable features, but causal feature-to-feature interactions across network depth remain unknown for biological foundation models. Results: We introduce causal circuit tracing by ablating SAE features and measuring downstream responses, and apply it to Geneformer V2-316M and scGPT whole-human across four conditions (96,892 edges, 80,191 forward passes). Both models show approximately 53 percent biological coherence and 65 to 89 percent inhibitory dominance, invariant to architecture and cell type. scGPT produces stronger effects (mean absolute d = 1.40 vs. 1.05) with more balanced dynamics. Cross-model consensus yields 1,142 conserved domain pairs (10.6x enrichment, p < 0.001). Disease-associated domains are 3.59x more likely to be consensus. Gene-level CRISPRi validation shows 56.4 percent directional accuracy, confirming co-expression rather than causal encoding.