Durham UniversityUniversity of St AndrewsMar 10, 2026arXiv:2603.09642

Multi-DNN Inference of Sparse Models on Edge SoCs

Jiawei Luo, Di Wu, Simon Dobson, Blesson Varghese

AI Summary

This paper introduces "model stitching," a technique for creating diverse model variants in multi-DNN inference systems by recombining subgraphs from sparse models without retraining. The goal is to improve the efficiency of matching models to suitable accelerators in edge SoCs, thereby reducing service level objective (SLO) violation rates. Experiments with SparseLoom, a demonstrator system, show that model stitching reduces SLO violation rates by up to 74%, improves throughput by up to 2.31x, and lowers memory overhead by an average of 28% compared to existing systems.

Key Contribution

By recombining subgraphs from sparse models without retraining, "model stitching" creates a diverse set of model variants that significantly improves the efficiency of multi-DNN inference on edge SoCs.

Abstract

Modern edge applications increasingly require multi-DNN inference systems to execute tasks on heterogeneous processors, gaining performance from both concurrent execution and from matching each model to the most suited accelerator. However, existing systems support only a single model (or a few sparse variants) per task, which impedes the efficiency of this matching and results in high Service Level Objective violation rates. We introduce model stitching for multi-DNN inference systems, which creates model variants by recombining subgraphs from sparse models without re-training. We present a demonstrator system, SparseLoom, that shows model stitching can be deployed to SoCs. We show experimentally that SparseLoom reduces SLO violation rates by up to 74%, improves throughput by up to 2.31x, and lowers memory overhead by an average of 28% compared to state-of-the-art multi-DNN inference systems.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multi-DNN Inference of Sparse Models on Edge SoCs

Related Papers