Mar 3, 2026arXiv:2603.03155

Information Routing in Atomistic Foundation Models: How Equivariance Creates Linearly Disentangled Representations

AI Summary

The paper introduces Composition Projection Decomposition (CPD), a method using QR projection to linearly remove compositional signal from atomistic model representations and probe the geometric residual. Applying CPD to eight models across five architectures on QM9 and Materials Project data, the study reveals a disentanglement gradient, with tensor product equivariant architectures (MACE) exhibiting almost fully linearly accessible geometry after composition removal, unlike handcrafted descriptors (ANI-2x). MACE also routes target-specific signals through specific irreducible representation channels, suggesting a structured organization of information within these models.

Key Contribution

Equivariant atomistic models like MACE aren't just more accurate; they learn linearly disentangled representations, making downstream tasks far more sample-efficient.

Abstract

What do atomistic foundation models encode in their intermediate representations, and how is that information organized? We introduce Composition Projection Decomposition (CPD), which uses QR projection to linearly remove composition signal from learned representations and probes the geometric residual. Across eight models from five architectural families on QM9 molecules and Materials Project crystals, we find a disentanglement gradient: tensor product equivariant architectures (MACE) produce representations where geometry is almost fully linearly accessible after composition removal ($R^2_{\text{geom}} = 0.782$ for HOMO-LUMO gap), while handcrafted descriptors (ANI-2x) entangle the same information nonlinearly ($R^2_{\text{geom}} = -0.792$ under Ridge; $R^2 = +0.784$ under MLP). MACE routes target-specific signal through irreducible representation channels -- dipole to $L = 1$, HOMO-LUMO gap to $L = 0$ -- a pattern not observed in ViSNet's vector-scalar architecture under the same probe. We show that gradient boosted tree probes on projected residuals are systematically inflated, recovering $R^2 = 0.68$--$0.95$ on a purely compositional target, and recommend linear probes as the primary metric. Linearly disentangled representations are more sample-efficient under linear probing, suggesting a practical advantage for equivariant architectures beyond raw prediction accuracy.

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References20

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Information Routing in Atomistic Foundation Models: How Equivariance Creates Linearly Disentangled Representations

Related Papers