Feb 15, 2026arXiv:2602.14039

Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models

AI Summary

The paper identifies a geometric inconsistency in Mixture-of-Experts (MoE) embedding models, where linear aggregation causes inward collapse of expert outputs on a shared hyperspherical manifold, distorting vector representations. To address this, they introduce Spherical Barycentric Aggregation (SBA), a novel aggregation operator that preserves the hyperspherical geometry by separating radial and angular components. Experiments on MTEB tasks demonstrate that SBA consistently improves performance while maintaining training cost and stability, validating the importance of geometry-aware aggregation.

Key Contribution

Linear aggregation in MoE embedding models crushes the geometry of expert representations, but a simple fix—Spherical Barycentric Aggregation—restores the hyperspherical structure and boosts performance.

Abstract

Mixture-of-Experts (MoE) embedding models combine expert outputs using weighted linear summation, implicitly assuming a linear subspace structure in the embedding space. This assumption is shown to be inconsistent with the geometry of expert representations. Geometric analysis of a modern MoE embedding model reveals that expert outputs lie on a shared hyperspherical manifold characterized by tightly concentrated norms and substantial angular separation. Under this geometry, linear aggregation induces inward collapse toward the manifold interior, distorting vector magnitude and direction and reducing embedding comparability. To address this inconsistency, Spherical Barycentric Aggregation (SBA) is introduced as a geometry-preserving aggregation operator that separates radial and angular components to maintain hyperspherical structure while remaining fully compatible with existing routing mechanisms. Experiments on selected tasks from the Massive Text Embedding Benchmark (MTEB), including semantic similarity, clustering, and duplicate question detection, demonstrate consistent performance improvements with identical training cost and full stability. Additional geometric analyses confirm that SBA prevents aggregation-induced collapse and preserves hyperspherical consistency, highlighting the importance of geometry-aware aggregation in MoE embedding architectures.

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models

Related Papers