May 28, 2026arXiv:2605.29489

Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging

Yuanyi Wang, Yanggan Gu, Su Lu, Yifan Yang, Zhaoyi Yan, Congkai Xie, Jianmin Wu, Hongxia Yang

AI Summary

This paper introduces MergePipe, an innovative execution layer that reformulates weight-space model merging as an expert access-set problem, optimizing the selection of expert delta blocks to minimize I/O under a specified budget. By indexing parameter blocks and constructing deterministic access plans, MergePipe achieves significant reductions in expert-read I/O—up to an order of magnitude—and accelerates merging processes by as much as 11 times. Notably, the approach maintains a minimal parameter deviation from full-read merges while ensuring no degradation in downstream performance across various workloads, including Qwen and Llama.

Key Contribution

Merging large language models just got a lot faster and more efficient, with MergePipe cutting expert-read I/O by up to 90% while preserving performance.

Abstract

Weight-space model merging is usually formulated as an algebraic operation on checkpoints, yet at LLM scale the limiting resource is often the set of expert weights that must be read. We introduce MergePipe, a budget-aware execution layer that casts LLM merging as an \emph{expert access-set} problem: given a merge operator and a checkpoint family in a shared weight coordinate system, choose which expert delta blocks to access under an explicit I/O budget. MergePipe indexes parameter blocks, builds deterministic access plans, and executes the induced budgeted merge with replayable manifests. The plan is budget-sound by construction and recovers the full-read merge at full budget; for fixed-coefficient additive operators, the omitted-update error is bounded by the norm of omitted deltas. Across Qwen and Llama merging workloads, MergePipe reduces expert-read I/O by up to an order of magnitude and achieves up to $11\times$ speedups. Representative budget sweeps show $O(10^{-3})$ parameter deviation from full-read merges and no monotonic degradation on downstream benchmarks.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Scaling Laws & Emergent Abilities

Citation Metrics

Citations0

Influential citations0

References32

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging

Related Papers