Laplacian DP and adaptive quantization can slash federated learning communication costs by over 50% without sacrificing accuracy or privacy, even with non-IID data.

Emre Ardıç, Emre Ardiç, Yakup Genç

Distributed Systems & Hardware Inference & Quantization Training Efficiency & Optimization

Apr 27, 2026

Rakshit Soni +9Apr 27, 2026

OpenPodcar2: a robust, ROS2 vehicle for self-driving research

Democratizing self-driving research, OpenPodcar2 offers a robust, low-cost (≈$7k new, $2k used), open-source autonomous vehicle platform ready for ROS2 integration and real-world deployment.

Rakshit Soni, Rakshit Soni, Chris Waltham +7

Distributed Systems & Hardware Open-Source Models & Weights Robotics & Embodied AI

Jiaming Qiu +1Apr 27, 2026

On the Benefits of Traffic"Reprofiling"-- The Multiple Hops Case -- Part II

Reprofiling network traffic isn't just for fancy schedulers – even simple FIFO and static priority setups can see significant bandwidth savings while guaranteeing hard delays.

Jiaming Qiu, Roch Guérin

Distributed Systems & Hardware

All Papers (100)

Apr 27, 2026

Alex Bienstock +7Apr 27, 2026

Scalable Secure Biometric Authentication without Auxiliary Identifiers

Finally, a practical biometric authentication system offers provable security against large-scale data breaches without sacrificing scalability or requiring auxiliary identifiers.

Alex Bienstock, Daniel Escudero, Antigoni Polychroniadou +5

Distributed Systems & Hardware Inference & Quantization

Zeyu BaiApr 27, 2026

Spark Policy Toolkit: Semantic Contracts and Scalable Execution for Policy Learning in Spark

Spark Policy Toolkit unlocks scalable policy learning in Spark by guaranteeing consistent results even with distributed execution, finally making it possible to apply complex policy learning techniques to large datasets.

Zeyu Bai

Distributed Systems & Hardware Inference & Quantization Training Efficiency & Optimization

Independent ResearcherApr 27, 2026

PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference

Squeeze your LLM inference costs: PolyKV slashes KV cache memory by up to 97% using a shared, compressed pool, with negligible impact on quality.

Ishan Patel, Ishan Patel, Ishan Joshi +1

Distributed Systems & Hardware Inference & Quantization

Ruhr University BochumApr 27, 2026

DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference

Not all layers are created equal: pruning the KV cache in a layer-dependent manner significantly boosts long-context LLM performance compared to uniform pruning strategies.

Zahra Dehghanighobadi, Asja Fischer

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Iizalaarab Elhaimeur +3Apr 27, 2026

Latency and Cost of Multi-Agent Intelligent Tutoring at Scale

Multi-agent LLM systems can maintain sub-4-second response times even under classroom-scale concurrency, but only with the right throughput tier.

Iizalaarab Elhaimeur, Iizalaarab Elhaimeur, Nikos Chrisochoides +1

Distributed Systems & Hardware Inference & Quantization Tool Use & Agents

Razwan Ahmed Tanvir +3Apr 27, 2026

A Tree-Based Repository Blockchain Framework for Shared Governance in Collaborative Fork Ecosystems

Ditch the complexity of Inter-Blockchain Communication: this tree-based blockchain framework lets you navigate hard forks like directories in a file system.

Razwan Ahmed Tanvir, Razwan Ahmed Tanvir, Greg Speegle +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Apr 27, 2026

Network Impact of Post-Quantum Certificate Chain sizes on Time to First Byte in TLS Deployments

Quantum-safe certificates bloat TLS handshakes so much that they measurably degrade web performance, and current CDN optimizations aren't enough to fully compensate.

Matthew Chou, Phuong M Cao

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Apr 27, 2026

A Survey on Split Learning for LLM Fine-Tuning: Models, Systems, and Privacy Optimizations

Split learning offers a surprisingly viable path to fine-tuning LLMs on sensitive data without breaking the bank or sacrificing privacy.

Zihan Liu, Yizhen Wang, Xiu Tang +1

Distributed Systems & Hardware Natural Language Processing Training Efficiency & Optimization

Apr 27, 2026

Extended Abstract: Shaperd: Easily Adoptable Real-Time Traffic Shaper for Fully Encrypted Protocols

Traffic shaping can be both powerful and practical: Shaperd lets you customize encrypted traffic flows in real-time to evade censorship without killing throughput.

Sarah Wilson, Stella Tian, Sina Kamali

Distributed Systems & Hardware Red-Teaming & Adversarial Robustness

Lavi Jain +1Apr 27, 2026

RowHammer Vulnerability Counter (RVC): Redefining RowHammer Detection with Victim-Centric Tracking

Forget activation counts – RVC slashes Rowhammer mitigation overhead by up to 99.99% by directly tracking a row's vulnerability to bit flips.

Lavi Jain, Venkata Kalyan Tavva

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Hang Deng +1Apr 27, 2026

Information-Theoretic Distributed Point Functions with Shorter Keys

Asymptotically shorter secret keys in Information-Theoretic Distributed Point Functions are now possible, thanks to a novel construction leveraging private information retrieval.

Hang Deng, Liang Feng Zhang

Distributed Systems & Hardware Recommendation & Information Retrieval

Verdict SecurityApr 27, 2026·also Ain Shams University

Machine-Checked Cardinality Bounds for Masked Barrett Reduction: A 1-Bit Side-Channel Leakage Barrier in Post-Quantum Cryptographic Hardware

Forget complex side-channel analysis: a single, machine-checked theorem proves that masked Barrett reduction leaks at most *one bit* of information per wire, offering a universal security guarantee for post-quantum crypto.

Ray Iskander, Khaled Kirah

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Apr 27, 2026·also Prince of Songkla University

Resolving Conflicts Between RTOS Timekeeping and Uninterruptable Trusted Computing

Guaranteeing atomicity in secure enclaves doesn't have to break real-time OS timekeeping – a secure-driven synchronization mechanism can unobtrusively keep everything in sync.

Antonio Joia Neto, Amarin Laohajirapan, Norrathep Rattanavipanon +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Apr 27, 2026·also CUHK

Mono2Sls: Automated Monolith-to-Serverless Migration via Multi-Stage Pipeline with Static Analysis

Automating monolith-to-serverless migration is now possible with an LLM-powered pipeline that outperforms commercial tools.

Xingyan Chen, Yuxin Su, Zishan Su +2

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

Michael A. Laurenzano +5Apr 27, 2026

Incisor: Ex Ante Cloud Instance Selection for HPC Jobs

Forget manual cloud HPC instance selection: Incisor uses LLMs to slash runtime and costs by over 40% with zero human intervention.

Michael A. Laurenzano, M. Laurenzano, Shihan Cheng +3

Distributed Systems & Hardware

Rakshit Soni +9Apr 27, 2026

OpenPodcar2: a robust, ROS2 vehicle for self-driving research

Democratizing self-driving research, OpenPodcar2 offers a robust, low-cost (≈$7k new, $2k used), open-source autonomous vehicle platform ready for ROS2 integration and real-world deployment.

Rakshit Soni, Rakshit Soni, Chris Waltham +7

Distributed Systems & Hardware Open-Source Models & Weights Robotics & Embodied AI

Apr 27, 2026·also Huawei

SDSL-Solver: Scalable Distributed Sparse Linear Solvers for Large-Scale Interior Point Methods

Solving massive optimization problems just got a whole lot faster: SDSL-Solver achieves up to 97x speedups over PARDISO by distributing sparse linear system solves across multiple nodes.

Shaofeng Yang, Yunting Wang, Yingying Cheng +3

Distributed Systems & Hardware Training Efficiency & Optimization

Xi Shen +3Apr 27, 2026

Opto-Atomic Spatio-Temporal Holographic Correlators for High-Speed 3D CNNs

Ditch silicon bottlenecks: a novel optoelectronic correlator uses cold atoms to accelerate 3D CNNs by orders of magnitude.

Xi Shen, Bowen Qi, Tabassom Hamidfar +1

Architecture Design (Transformers, SSMs, MoE)Computer Vision Distributed Systems & Hardware

Chen Feng +19Apr 27, 2026·also Nankai University, UC Santa Cruz

FreeScale: Distributed Training for Sequence Recommendation Models with Minimal Scaling Cost

Sequence recommendation models can achieve near-perfect scaling efficiency in distributed training, slashing wasted GPU cycles by up to 90%.

Chen Feng, Haoli Zhang, Sh. B. Ali-zade +17

Distributed Systems & Hardware Recommendation & Information Retrieval Training Efficiency & Optimization

Taeyoon Kim +4Apr 27, 2026

KubePACS: Kubernetes Cluster Using Performant, Highly Available, and Cost Efficient Spot Instances

Stop paying a 55% performance-per-dollar premium: KubePACS optimizes Kubernetes spot instance provisioning for cost, performance, and availability, blowing away existing solutions.

Taeyoon Kim, Kyumi Kim, Enrique Molina-Gim'enez +2

Distributed Systems & Hardware Training Efficiency & Optimization

Tsinghua AIApr 27, 2026

FlashOverlap: Minimizing Tail Latency in Communication Overlap for Distributed LLM Training

FlashOverlap shatters the tail latency bottleneck in distributed LLM training by orchestrating peer-to-peer communication with fine-grained computation overlap.

Rezaul Karim, Austin Wen, Zongzuo Wang +3

Distributed Systems & Hardware Training Efficiency & Optimization

Jiaming Qiu +1Apr 27, 2026

On the Benefits of Traffic"Reprofiling"-- The Multiple Hops Case -- Part II

Reprofiling network traffic isn't just for fancy schedulers – even simple FIFO and static priority setups can see significant bandwidth savings while guaranteeing hard delays.

Jiaming Qiu, Roch Guérin

Distributed Systems & Hardware

Milo Liebster +2Apr 27, 2026

D\'ej\`a Vu Packing: Optimizing FPGA Logic Clustering Runtime via Pattern Memoization

FPGA CAD tools waste enormous time re-checking the same cluster packings, but a simple memoization trick can slash runtime by up to 29x.

Milo Liebster, Amin Mohaghegh, Andrew Boutros

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

Anthony Faure-Gignoux +3Apr 27, 2026

Compilation and Execution of an Embeddable YOLO-NAS on the VTA

Compiling and executing YOLO-NAS on an FPGA-based accelerator is now possible, opening doors for real-time object detection in safety-critical applications like aeronautics.

Anthony Faure-Gignoux, Kevin Delmas, Adrien Gauffriau +1

Computer Vision Distributed Systems & Hardware Inference & Quantization

Wang Fan +7Apr 27, 2026

Salca: A Sparsity-Aware Hardware Accelerator for Efficient Long-Context Attention Decoding

Forget A100s for long-context LLMs – Salca achieves up to 74x better energy efficiency with a sparsity-aware hardware accelerator.

Wang Fan, Wei Cao, Xionghui Zha +5

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Meta AIApr 27, 2026·also School of Cybersecurity

Versioned Late Materialization for Ultra-Long Sequence Training in Recommendation Systems at Scale

Storing user interaction histories in a normalized, immutable tier and reconstructing sequences just-in-time slashes data infrastructure costs and unlocks the potential of ultra-long sequence DLRMs.

Liang Guo, Ge Song, Litao Deng +8

Distributed Systems & Hardware Recommendation & Information Retrieval Training Efficiency & Optimization

Andreas Kouloumpris +3Apr 27, 2026·also KIOS Research and Innovation Center of Excellence

Exact, Efficient, and Reliable Multiobjective and Multiconstrained IoT Workflow Scheduling in Edge–Hub–Cloud Cyber–Physical Systems

Ditch the heuristics: MILP delivers up to 30% better latency, energy, and reliability for IoT workflow scheduling in edge-hub-cloud systems.

Andreas Kouloumpris, Georgios L. Stavrinides, Maria K. Michael +1

Distributed Systems & Hardware Robotics & Embodied AI

Apr 27, 2026·also ICT CAS, USTC

TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training

Squeezing intermediate tensors with FP8 quantization and adaptive transforms can nearly double the throughput of tensor-parallel LLM training without sacrificing accuracy.

Man Liu, Xingjian Tian, Bing Lu +6

Distributed Systems & Hardware Inference & Quantization Training Efficiency & Optimization

Taeyoon Kim +6Apr 27, 2026

SpotVista: Availability-Aware Recommendation System for Reliable and Cost-Efficient Multi-Node Spot Instances

Multi-node spot instance configurations recommended by SpotVista offer 81% greater availability and 26% more cost savings than current state-of-the-art and publicly available services.

Taeyoon Kim, Kyumi Kim, Kyunghwan Kim +4

Distributed Systems & Hardware Recommendation & Information Retrieval

Apr 25, 2026

Emre Ardıç +2Apr 25, 2026·also Gebze Technical University

Enhanced Privacy and Communication Efficiency in Non-IID Federated Learning With Adaptive Quantization and Differential Privacy

Laplacian DP and adaptive quantization can slash federated learning communication costs by over 50% without sacrificing accuracy or privacy, even with non-IID data.

Emre Ardıç, Emre Ardiç, Yakup Genç

Distributed Systems & Hardware Inference & Quantization Training Efficiency & Optimization

Apr 23, 2026

Apr 23, 2026·also SJTU

TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

LLMs, when combined with efficient indexing, can extract actionable incidents from just a handful of noisy user descriptions in real-time, enabling rapid anomaly detection in large-scale cloud services.

Jun Wang, Ziyin Zhang, Rui Wang +3

Distributed Systems & Hardware Natural Language Processing Recommendation & Information Retrieval

Both authors contributed equally to thisApr 23, 2026

A-IC3: Learning-Guided Adaptive Inductive Generalization for Hardware Model Checking

Stop guessing which inductive generalization strategy works best for IC3 – this adaptive, learning-guided approach solves significantly more hardware model checking problems.

Xiaofeng Zhou, Guangyu Hu, Hongce Zhang +1

Distributed Systems & Hardware

Ashley Abraham +4Apr 23, 2026

Large-Scale Data Parallelization of Product Quantization and Inverted Indexing Using Dask

Scale up your nearest neighbor search without blowing your budget: this work shows how to use Dask to parallelize Product Quantization and Inverted Indexing, achieving accuracy comparable to single-machine methods on much larger datasets.

Ashley Abraham, Andrew Strelzoff, Haley R. Dozier +2

Distributed Systems & Hardware Inference & Quantization Recommendation & Information Retrieval

Rishona Daniels +4Apr 23, 2026

On the Role of Preprocessing and Memristor Dynamics in Reservoir Computing for Image Classification

Volatile memristors can achieve state-of-the-art image classification accuracy in reservoir computing, even with significant device variability, suggesting they are a viable alternative to traditional CMOS.

Rishona Daniels, Duna Wattad, Ronny Ronen +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Distributed Systems & Hardware

Guilin Deng +14Apr 23, 2026

Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach

FedLLMs, thought to be safer due to data localization, are shockingly vulnerable: a new attack achieves near 100% membership inference accuracy, even with differential privacy.

Guilin Deng, Guilin Deng, Silong Chen +12

Distributed Systems & Hardware Red-Teaming & Adversarial Robustness Training Efficiency & Optimization

Emilie Frost +1Apr 23, 2026

Architectures for Robust Self-Organizing Energy Systems under Information and Control Constraints

Securing energy grids against cyberattacks may hinge on clever observer/controller architectures that respect data privacy and regulatory constraints.

Emilie Frost, Astrid Nieße

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Red-Teaming & Adversarial Robustness

Apr 23, 2026

Multilinguality at the Edge: Developing Language Models for the Global South

Deploying language models in the Global South requires bridging the gap between multilingual NLP and edge computing, two fields that have largely evolved independently despite their shared goals.

Lester James Validad Miranda, Songbo Hu, Roi Reichart +1

Distributed Systems & Hardware Inference & Quantization Natural Language Processing

Maryam Taghi Zadeh +1Apr 23, 2026

Physically Unclonable Functions for Secure IoT Authentication and Hardware-Anchored AI Model Integrity

Software-only security is insufficient for AI-enabled IoT in adversarial settings: hardware-rooted trust mechanisms, like PUFs and hybrid designs, are the only viable path forward.

Maryam Taghi Zadeh, Mohsen Ahmadi

Distributed Systems & Hardware

Jens Kanstrup Larsen +5Apr 23, 2026

NEST: Network Enforced Session Types (Technical Report)

Guarantee application-level protocol compliance without touching application code by pushing runtime verification into the network itself.

Jens Kanstrup Larsen, A. Scalas, Guy Amir +3

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Anvitha Ramachandran +2Apr 23, 2026

GraphLeap: Decoupling Graph Construction and Convolution for Vision GNN Acceleration on FPGA

Vision GNNs can achieve near 100x speedups on FPGAs by decoupling graph construction from feature updates, enabling concurrent execution without significant accuracy loss after fine-tuning.

Anvitha Ramachandran, Dhruv Parikh, Viktor K. Prasanna

Architecture Design (Transformers, SSMs, MoE)Computer Vision Distributed Systems & Hardware

Liam Burns +6Apr 23, 2026

A Case Study in Recovery of Drones using Discrete-Event Systems

Guaranteeing swarm drone recovery from faults is now possible with a hybrid discrete-event system that merges high-level supervision with low-level control.

Liam Burns, Dayse M. Cavalcanti, Felipe G. Cabral +4

Distributed Systems & Hardware Robotics & Embodied AI

Kashish Mittal +5Apr 23, 2026

Optimizing High-Throughput Distributed Data Pipelines for Reproducible Deep Learning at Scale

Data loading bottlenecks can strangle your GPU utilization down to 10%, but a few smart optimizations can unlock a 6x speedup.

Kashish Mittal, Di Yu, Roozbeh Ketabi +3

Data Curation & Synthetic Data Distributed Systems & Hardware Training Efficiency & Optimization

IITApr 23, 2026·also Edinburgh

Leveraging SIMD for Accelerating Large-number Arithmetic

SIMD parallelism can finally unlock substantial speedups in large-number arithmetic by rethinking algorithms around data-parallel operations, yielding up to 19.3% throughput gains in scientific computing.

Subhrajit Das, Abhishek Bichhawat, Yuvraj Patel

Distributed Systems & Hardware Inference & Quantization Training Efficiency & Optimization

Mohan Liyanage +3Apr 23, 2026

Risk-Aware and Stable Edge Server Selection Under Network Latency SLOs

Reduce deadline misses and server switching by explicitly accounting for tail risk and stability in edge server selection.

Mohan Liyanage, Arnova Abdullah, E. Zhantileuov +1

Distributed Systems & Hardware Inference & Quantization

Hongyao Liu +2Apr 23, 2026

An Efficient Wireless iBCI Headstage with Adaptive ADC Sample Rate

A server-driven adaptive sampling approach slashes power consumption in wireless iBCIs by 40mW while *improving* decoding accuracy.

Hongyao Liu, Junyi Wang, L. Zhai

Distributed Systems & Hardware Inference & Quantization Training Efficiency & Optimization

Hongyao Liu +3Apr 23, 2026

SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference

On-device LLM inference gets a massive speed and energy boost by adaptively streaming only the most expensive parts of the KV cache from the cloud.

Hongyao Liu, L. Zhai, Junyi Wang +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Mingqi Han +1Apr 23, 2026

A Task Decomposition and Planning Framework for Efficient LLM Inference in AI-Enabled WiFi-Offload Networks

Forget simple offloading – this framework intelligently decomposes LLM tasks across devices and edge servers, slashing latency and boosting rewards in congested WiFi networks.

Mingqi Han, Xing Sun

Distributed Systems & Hardware Inference & Quantization Tool Use & Agents

Guoyu Li +11Apr 23, 2026

SPAC: Automating FPGA-based Network Switches with Protocol Adaptive Customization

Forget hand-tuning: SPAC automatically generates FPGA-based network switches that slash latency by up to 38% while dramatically reducing resource usage.

Guoyu Li, Yang Cao, Lucas H L Ng +9

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Apr 23, 2026·also Microsoft Research, University of Wyoming

Position Paper: Denial-of-Service Against Multi-Round Transaction Simulation

MEV searchers beware: a new, low-cost DoS attack can cripple transaction bundling services like Flashbots by exploiting inter-transaction dependencies and atomic block inclusion.

Yuzhe Tang, Yibo Wang, Wanning Ding +2

Distributed Systems & Hardware Red-Teaming & Adversarial Robustness

Apr 23, 2026

Institutionalizing Best Practices in Research Computing: A Framework and Case Study for Improving User Onboarding

Frustrated by researchers struggling to access complex computing resources? This framework offers a practical solution for streamlining onboarding and boosting user success.

A. Chaturvedi, R. Pokorney, Elyn Fritz-Waters +4

Distributed Systems & Hardware

Arthur Douillard +16Apr 23, 2026

Decoupled DiLoCo for Resilient Distributed Pre-training

Achieve zero global downtime in large-scale pre-training, even with millions of simulated chip failures, by decoupling learners and asynchronously aggregating parameter updates.

Arthur Douillard, Keith Rush, Yani Donchev +14

Distributed Systems & Hardware Training Efficiency & Optimization

Ankur Sharma +3Apr 23, 2026

Time, Causality, and Observability Failures in Distributed AI Inference Systems

Clock skew as small as 5ms can break causality in observability data from distributed AI inference systems, even when the system is working perfectly.

Ankur Sharma, Deep Shah, David Lariviere +1

Distributed Systems & Hardware Inference & Quantization

Apr 22, 2026

Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling

Exact attention over billion-token sequences is now possible on a single GPU, thanks to a novel streaming approach that avoids out-of-memory errors without approximation.

Yiming Bian, Joshua M. Akey

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Sina Gholami +4Apr 22, 2026

FedSIR: Spectral Client Identification and Relabeling for Federated Learning with Noisy Labels

Spectral analysis of client feature representations can identify and relabel noisy data in federated learning, outperforming existing noise-tolerant loss and loss-dynamic approaches.

Sina Gholami, Abdulmoneam Ali, Tania Haghighi +2

Data Curation & Synthetic Data Distributed Systems & Hardware Training Efficiency & Optimization

Hangzhou Dianzi UniversityApr 22, 2026

Lifecycle-Aware Federated Continual Learning in Mobile Autonomous Systems

Layer-selective rehearsal and rapid recovery strategies can boost model performance in federated learning by over 30% in real-world applications.

Beining Wu

Distributed Systems & Hardware Robotics & Embodied AI Training Efficiency & Optimization

Samsung R&D Institute UK (SRUK)Apr 22, 2026·also Samsung

Differentially Private Clustered Federated Learning with Privacy-Preserving Initialization and Normality-Driven Aggregation

Differentially private federated learning gets a boost: PINA achieves 2.9% higher accuracy than state-of-the-art methods by using a novel two-stage approach with privacy-preserving initialization and normality-driven aggregation.

Jie Xu, Haaris Mehmood, Rogier Van Dalen +2

Data Curation & Synthetic Data Distributed Systems & Hardware Training Efficiency & Optimization

Apr 22, 2026·also Côte d'Azur, Princeton, Sheffield, Université de la Polynésie franc ¸aise

Decentralized Machine Learning with Centralized Performance Guarantees via Gibbs Algorithms

Decentralized learning can match centralized performance by sharing only Gibbs measures, not datasets, opening new avenues for privacy-preserving collaboration.

Yaiza Bermudez, Samir Perlaza, Iñaki Esnaola

Distributed Systems & Hardware Training Efficiency & Optimization

H. Pham +1Apr 22, 2026

Scalable AI Inference: Performance Analysis and Optimization of AI Model Serving

Optimizing AI inference can boost throughput and reduce latency, revealing strategies that enhance performance under real-world traffic conditions.

H. Pham, Fatih Gedikli

Distributed Systems & Hardware Inference & Quantization

Apr 22, 2026·also PolyU

Towards Event-Aware Forecasting in DeFi: Insights from On-chain Automated Market Maker Protocols

AMM price prediction accuracy jumps 56% by explicitly modeling the uncertainty in block intervals, revealing the critical role of on-chain event timing.

Huaiyu Jia, Jiehshun You, Yizhi Luo +2

Distributed Systems & Hardware

Prashant PathakApr 22, 2026

Pre-Execution Query Slot-Time Prediction in Cloud Data Warehouses: A Feature-Scoped Machine Learning Approach

Predicting query slot-time consumption with a machine learning model reduces cost estimation errors by up to 37%, revolutionizing budget management in cloud data warehouses.

Prashant Pathak

Distributed Systems & Hardware Training Efficiency & Optimization

HP Inc.Apr 22, 2026

A Delta-Aware Orchestration Framework for Scalable Multi-Agent Edge Computing

Scaling multi-agent systems past 100 agents can trigger a "Synergistic Collapse" costing hundreds of thousands of dollars, but this framework prevents it.

Samaresh Kumar Singh, Joyjit Roy

Computer Vision Distributed Systems & Hardware Tool Use & Agents

QreativeLab Inc. MontréalApr 22, 2026

Mythos and the Unverified Cage: Z3-Based Pre-Deployment Verification for Frontier-Model Sandbox Infrastructure

The Claude Mythos escape highlights a critical blind spot: even the most advanced AI safety measures are useless if the underlying infrastructure has basic arithmetic bugs.

Dominik Blain

Code Generation & Program Synthesis Distributed Systems & Hardware Red-Teaming & Adversarial Robustness

Deevashwer Rathee +4Apr 22, 2026

Onyx: Cost-Efficient Disk-Oblivious ANN Search

Leaking user queries through disk access patterns in sensitive ANN search? Onyx flips the script on prior work to achieve up to 9.9x cost reduction and 12.3x latency improvement.

Deevashwer Rathee, Jean Watson, Zirui Neil Zhao +2

Distributed Systems & Hardware Inference & Quantization Recommendation & Information Retrieval

Verdict SecurityApr 22, 2026·also Ain Shams University

Fresh Masking Makes NTT Pipelines Composable: Machine-Checked Proofs for Arithmetic Masking in PQC Hardware

Machine-checked proofs now guarantee the security of arithmetic masking in NTT pipelines, but watch out: even a single lapse in "fresh masking" can expose vulnerabilities, as seen in the Adams Bridge accelerator.

Ray Iskander, Khaled Kirah

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Apr 22, 2026

DeepParse: Hybrid Log Parsing with LLM-Synthesized Regex Masks

LLMs can bootstrap accurate and efficient log parsing by synthesizing regex masks, enabling a hybrid approach that outperforms both heuristic and LLM-only methods.

Amir Shetaia, Sean Kauffman

Distributed Systems & Hardware Inference & Quantization Natural Language Processing

Yussur Mustafa Oraji +1Apr 22, 2026

Extending Contract Verification for Parallel Programming Models to Fortran

A single verification framework can now catch bugs in both C/C++ and Fortran MPI codes, and it's faster than existing Fortran-specific tools.

Yussur Mustafa Oraji, Christian Bischof

Code Generation & Program Synthesis Distributed Systems & Hardware

Srujan Kumar GandlaApr 22, 2026

Characterizing and Fixing Silent Data Loss in Spark-on-AWS-Lambda with Open Table Formats

Lambda timeouts in Spark jobs writing to Delta Lake and Iceberg tables cause 100% silent data loss, but a simple wrapper can eliminate it.

Srujan Kumar Gandla

Distributed Systems & Hardware

Apr 22, 2026

Evaluating Computing Platforms for Sustainability: A Comparative Analysis of FPGAs against ASICs, GPUs, and CPUs

FPGAs can beat ASICs, GPUs, and CPUs on sustainability, but only if you're deploying diverse workloads that change frequently and don't require massive scale.

Chetan Choppali Sudarshan, Aman Arora, Vidya A Chhabria

Constitutional AI & AI Ethics Distributed Systems & Hardware

Apr 22, 2026·also Fermi National Accelerator Laboratory, NRL

Distributed Quantum-Enhanced Optimization: A Topographical Preconditioning Approach for High-Dimensional Search

Quantum computers can serve as effective "topographical preconditioners" to guide classical solvers in high-dimensional optimization, bypassing the limitations of both purely quantum and classical approaches.

Dominik Soós, Marc Paterno, John Stenger

Distributed Systems & Hardware Scientific Discovery & Drug Design Training Efficiency & Optimization

Apr 22, 2026·also CAS

FASER: Fine-Grained Phase Management for Speculative Decoding in Dynamic LLM Serving

Fine-grained management of speculative decoding phases can boost LLM serving throughput by over 50% and cut latency nearly in half.

Wenyan Chen, Chengzhi Lu, Yanying Lin +1

Distributed Systems & Hardware Inference & Quantization

Apr 22, 2026·also IBM Research, Kyung Hee University, Notre Dame

Distributed Quantum Optimization for Large-Scale Higher-Order Problems with Dense Interactions

Quantum optimization can now tackle previously intractable, large-scale scientific optimization problems with dense, higher-order interactions, outperforming classical methods in both speed and solution quality.

Seongmin Kim, Vincent R. Pascuzzi, Travis S. Humble +5

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

Kyungmi Lee +5Apr 22, 2026

EnergAIzer: Fast and Accurate GPU Power Estimation Framework for AI Workloads

Forget hours-long simulations: EnergAIzer slashes GPU power estimation time to seconds while maintaining accuracy, by exploiting structured patterns in AI kernel optimizations.

Kyungmi Lee, Zhiye Song, Eun Kyung Lee +3

Distributed Systems & Hardware Inference & Quantization Training Efficiency & Optimization

Naser Khatti Dizabadi +1Apr 22, 2026

A Novel Low-Power Cache Architecture Based on 6-Transistor SRAM Cells

Stacking SRAM cells slashes leakage power without adding transistors.

Naser Khatti Dizabadi, Ceyda Elcin Kaya

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Apr 22, 2026·also Syracuse, UIUC, Washington State

Worst-Case Optimal GPU Datalog

Datalog on GPUs just got a whole lot faster: SRDatalog achieves up to 47x speedups by finally making worst-case optimal joins practical on GPUs.

Yihao Sun, Kunting Qi, Thomas Gilray +2

Code Generation & Program Synthesis Distributed Systems & Hardware

Università di Camerino and Gran SassoApr 22, 2026·also Gran Sasso Science Institute, NOVA School of Science and Technology

Automatic Code and Test Generation of Smart Contracts from Coordination Models

Automating smart contract creation from high-level coordination models slashes development time and boosts reliability.

Elvis Konjoh Selabi, Maurizio Murgia, António Ravara +1

Code Generation & Program Synthesis Distributed Systems & Hardware

Apr 22, 2026·also Samsung Electronics, UIUC

PVAC: A RowHammer Mitigation Architecture Exploiting Per-victim-row Counting

Flipping the script on RowHammer defense, PVAC counts activations on victim rows instead of aggressors, slashing false positives and boosting performance.

Jumin Kim, Seungmin Baek, Hwayong Nam +3

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Zeyu Shen +1Apr 22, 2026

Temporally Extended Mixture-of-Experts Models

Dramatically cut MoE expert-switching rates (from 50% to <5%) with minimal accuracy loss by training a controller to decide *when* to switch, not just *which* expert to use.

Zeyu Shen, Peter Henderson

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

Apr 21, 2026

Clemson UniversityApr 21, 2026

Accelerating Optimization and Machine Learning through Decentralization

Decentralizing optimization can paradoxically *accelerate* machine learning convergence, beating centralized methods even when per-iteration time is held constant.

Ziqin Chen, Zuang Wang

Distributed Systems & Hardware Training Efficiency & Optimization

Apr 21, 2026

SAGE: Training-Free Semantic Evidence Composition for Edge-Cloud Inference under Hard Uplink Budgets

Naive attention-based filtering for edge-cloud inference is suboptimal under tight bandwidth constraints; prioritizing semantic diversity in transmitted embeddings yields surprisingly large accuracy gains.

Inhyeok Choi, Hyuncheol Park

Distributed Systems & Hardware Inference & Quantization

Apr 21, 2026

Heterogeneity-Aware Personalized Federated Learning for Industrial Predictive Analytics

Personalized federated learning can now handle the messy reality of heterogeneous industrial data, enabling more accurate failure time predictions across diverse clients.

Yuhan Hu, Xiaolei Fang

Distributed Systems & Hardware Training Efficiency & Optimization

Yi Zhao +4Apr 21, 2026

Optimal Routing for Federated Learning over Dynamic Satellite Networks: Tractable or Not?

Routing optimization in federated learning over dynamic satellite networks reveals clear boundaries between tractable and intractable problems, with practical algorithms for the former.

Yi Zhao, Di Yuan, Tao Deng +2

Distributed Systems & Hardware Training Efficiency & Optimization

University of LübeckApr 21, 2026·also University of Pisa

Scalable Memristive-Friendly Reservoir Computing for Time Series Classification

Compact, gradient-free MARS models can now outperform state-of-the-art gradient-based sequence models like Mamba, while slashing training times from hours to milliseconds.

Coşku Can Horuz, Andrea Ceni, Claudio Gallicchio

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Harekrushna Sahu +2Apr 21, 2026

FedSEA: Achieving Benefit of Parallelization in Federated Online Learning

Online federated learning can actually benefit from parallelization, but only if temporal data variation is mild relative to stochastic gradient variance—a condition often overlooked in existing pessimistic analyses.

Harekrushna Sahu, Pratik Jawanpuria, Pranay Sharma

Distributed Systems & Hardware Training Efficiency & Optimization

Jinda Jia +10Apr 21, 2026

SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving

Forget fancy quantization schemes – a simple token-wise INT4 quantization with Hadamard rotation is all you need to nearly match FP16 accuracy in LLM serving, without sacrificing throughput.

Jinda Jia, Jisen Li, Zhongzhu Zhou +8

Distributed Systems & Hardware Inference & Quantization

Tao Fan +5Apr 21, 2026·also PolyU, Shanghai AI Lab

FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion

Achieve centralized-level performance in federated LLM fine-tuning without compromising IP, privacy, or performance on heterogeneous data by using a compressed "proxy" model.

Tao Fan, Guoqiang Ma, Yuanfeng Song +3

Distributed Systems & Hardware Natural Language Processing Training Efficiency & Optimization

Hongwei XuApr 21, 2026

Mesh Memory Protocol: Semantic Infrastructure for Multi-Agent LLM Systems

LLM agents can now collaborate effectively across sessions, sharing and evaluating cognitive states with field-level precision and traceable lineage, thanks to a new "semantic infrastructure" protocol.

Hongwei Xu

Distributed Systems & Hardware Tool Use & Agents

Apr 21, 2026·also University of Kaiserslautern-Landau

Towards Energy Impact on AI-Powered 6G IoT Networks: Centralized vs. Decentralized

Distributed ML slashes energy consumption in 6G IoT networks by up to 70% without sacrificing prediction accuracy, offering a greener path forward.

Anjie Qiu, Donglin Wang, Sanket Partani +2

Distributed Systems & Hardware Training Efficiency & Optimization

I. Thompson +2Apr 21, 2026

DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs

You can now achieve centralized LLM log anomaly detection performance in federated settings without sacrificing privacy, thanks to parameter-efficient fine-tuning of TinyLLMs.

I. Thompson, Tanmay Sen, Ritwik Bhattacharya

Distributed Systems & Hardware Natural Language Processing Open-Source Models & Weights

Apr 21, 2026·also BUET, Kyung Hee University, PolyU

DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing

Attention's quadratic complexity is no longer a bottleneck: DASH-KV achieves linear O(N) inference without sacrificing accuracy by reformulating attention as an approximate nearest-neighbor search.

Yutong Li, Jiehui Xie, Md. Tamim Iqbal +5

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Stanford HAIApr 21, 2026·also Macquarie

Are Large Language Models Economically Viable for Industry Deployment?

Forget chasing the biggest LLM – this benchmark reveals that smaller models (<2B params) can deliver 3x better energy efficiency and faster ROI in real-world industry deployments.

Abdullah Mohammad, Sushant Kumar Ray, Pushkar Arora +4

Distributed Systems & Hardware Eval Frameworks & Benchmarks Inference & Quantization

Van Lang UniversityApr 21, 2026·also School of Technology

CHRONOS: A Hardware-Assisted Phase-Decoupled Framework for Secure Federated Learning in IoT

Federated learning can be sped up by 74% without sacrificing security, thanks to a novel hardware-assisted approach that cleverly decouples cryptographic setup from the active training phase.

Hung Dang, Hung Dang

Distributed Systems & Hardware Inference & Quantization

City University of New YorkApr 21, 2026

A Network-Aware Evaluation of Distributed Energy Resource Control in Smart Distribution Systems

Ignoring realistic communication delays in distributed energy resource control can cause large oscillations in power and voltage violations, even with well-designed algorithms.

Houchao Gan

Distributed Systems & Hardware

College of Semiconductor ResearchApr 21, 2026·also Department of Electrical Engineering, National Tsing Hua University

Silicon Aware Neural Networks

Neural networks made of logic gates can now be directly compiled to silicon, achieving impressive MNIST classification speeds with low power consumption.

Sebastian Fieldhouse, Kea-Tiong Tang

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Luiz Giacomossi +8Apr 21, 2026·also Mälardalen University, Scuola Superiore Sant'Anna Pisa

Scheduling Analysis of UAV Flight Control Workloads using Raspberry Pi 5 Using PREEMPT_RT Linux

Even with PREEMPT_RT, hardware memory contention on modern SoCs introduces significant jitter for real-time UAV control, highlighting a critical bottleneck beyond OS scheduling.

Luiz Giacomossi, Haakan Forsberg, Håkan Forsberg +6

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Robotics & Embodied AI

L. Pennati +2Apr 21, 2026·also KTH

Mass Matrix Assembly on Tensor Cores for Implicit Particle-In-Cell Methods

Implicit particle methods get a 3x speed boost by recasting mass matrix assembly into tensor core-friendly matrix multiplications.

L. Pennati, Luca Pennati, Stefano Markidis

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

Lakshani Manamperi +5Apr 21, 2026·also University of Moratuwa

CROWDio: A Practical Mobile Crowd Computing Framework with Developer-Oriented Design, Adaptive Scheduling, and Fault Resilience

Mobile crowd computing can be practical: CROWDio's developer-friendly SDK and adaptive scheduling achieve significant performance gains on real-world smartphone workloads.

Lakshani Manamperi, Disumi Pathirana, Thiwanka Pathirana +3

Distributed Systems & Hardware Tool Use & Agents

Apr 21, 2026

ReaLB: Real-Time Load Balancing for Multimodal MoE Inference

ReaLB achieves 1.29x faster multimodal MoE inference by dynamically adjusting expert precision, proving that real-time adaptation can overcome modality-induced load imbalances.

Yingping Wang, Yi Wu, Xiangyu Wu +4

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Multimodal Models

Embedded Systems LabApr 21, 2026

Energy Efficient LSTM Accelerators for Embedded FPGAs Through Parameterised Architecture Design

Achieve LSTM acceleration on embedded FPGAs with 11.89 GOP/s/W energy efficiency by tuning architectural parameters.

Chao Qian, Tianheng Ling, Gregor Schiele7

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Isaac Lera +1Apr 21, 2026

YAIFS: Yet (not) Another Intelligent Fog Simulator: A Framework for Agent-Driven Computing Continuum Modeling&Simulation

Forget static simulations – YAIFS lets you build interactive, agent-driven cloud-edge environments controllable via LLMs and multi-agent systems.

Isaac Lera, Carlos Guerrero

Distributed Systems & Hardware Tool Use & Agents World Models & Planning

Apr 21, 2026

Secure Storage and Privacy-Preserving Scanpath Comparison via Garbled Circuits in Eye Tracking

Unlock privacy-preserving eye-tracking analysis with garbled circuits, enabling secure scanpath comparison without revealing sensitive gaze data.

Suleyman Ozdel, Amr Nader, Amr A. Nader +2

Distributed Systems & Hardware Inference & Quantization