Search papers, labs, and topics across Lattice.
This paper introduces a parameter-free decomposition for Mixture-of-Experts (MoE) models that separates each layer's hidden state into a control signal for routing and an orthogonal content channel. The authors demonstrate that while individual experts retain polysemantic characteristics, the routing paths become monosemantic, effectively clustering tokens by their semantic functions across various contexts. This approach reveals that the interpretability of MoEs should focus on the trajectories of tokens rather than the experts themselves, providing insights into how routing decisions influence model behavior.
Routing decisions in MoEs can create distinct semantic paths for tokens, revealing that interpretability hinges on trajectories rather than individual experts.
An LLM's residual stream is both state and instruction: it encodes the current context and determines the next transformation. We introduce a parameter-free decomposition for Mixture-of-Experts models that splits each layer's hidden state into a control signal that causally drives routing and an orthogonal content channel invisible to the router. Across six MoE architectures, we find that models preserve surface-level features (language, token identity, position) in the content channel, while the control signal encodes an abstract function that rotates from layer to layer. Because each routing decision is low-bandwidth, this hand-off forces compositional specialization across layers. While individual experts remain polysemantic, expert paths become monosemantic, clustering tokens by semantic function across languages and surface forms. The same token (e.g., ":") follows distinct trajectories depending on whether it serves as a type annotation, an introductory colon, or a time separator. Our decomposition identifies the source of this structure: clusters in the control subspace are substantially more monosemantic than those in the full representation. As a result, the natural unit of interpretability in MoEs is not the expert but the trajectory.