May 25, 2026arXiv:2605.26323

Totoro$^+$: An Adaptive and Scalable Edge Federated Learning System

Cheng-Wei Ching, Xin Chen, Taehwan Kim, Jian-Jhih Kuo, Dilma Da Silva, Liting Hu

AI Summary

Totoro$^+$ is a decentralized federated learning (FL) system that uses a distributed hash table (DHT)-based peer-to-peer (P2P) model to enable massive FL applications to run simultaneously on edge networks. It assigns a dedicated parameter server to each application, allowing any edge node to act as coordinator, aggregator, or worker. The system incorporates a locality-aware P2P multi-ring structure, a publish/subscribe-based forest abstraction, and a game-theoretic path planning model, achieving significant speedups and scalability in real-world experiments.

Key Contribution

Forget centralized parameter servers: Totoro$^+$'s decentralized architecture lets you run massive federated learning applications simultaneously on edge networks, scaling gracefully and adapting to network churn.

Abstract

Federated Learning (FL) is an emerging distributed machine learning (ML) technique that enables in-situ model training and inference on decentralized edge devices. We propose Totoro$^+$, a novel scalable FL system that enables massive FL applications to run simultaneously on edge networks. The key insight is to explore a distributed hash table (DHT)-based peer-to-peer (P2P) model to re-architect the centralized FL system design into a fully decentralized one. In contrast to previous studies where many FL applications shared one centralized parameter server, Totoro$^+$ assigns a dedicated parameter server to each application. Any edge node can act as any application's coordinator, aggregator, client selector, worker (participant device), or any combination of the above, thereby radically improving scalability and adaptivity. Totoro$^+$ introduces three innovations to realize its design: a locality-aware P2P multi-ring structure, a publish/subscribe-based forest abstraction, and a game-theoretic path planning model with a guarantee of an $ε$-approximate Nash equilibrium. Real-world experiments on 500 Amazon EC2 servers show that Totoro$^+$ scales gracefully with the number of FL applications and $N$ edge nodes speeds up the total training time by $1.2\times-14.0\times$, achieves $\mathcal{O}(\log N)$ hops for model dissemination and gradient aggregation with millions of nodes, and efficiently adapts to the practical edge networks and churns.

Distributed Systems & Hardware Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Totoro$^+$: An Adaptive and Scalable Edge Federated Learning System

Related Papers