Apr 29, 2026arXiv:2604.27162

A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations

AI Summary

The paper introduces Hide-And-Seek-Engine (HASE), a compute-efficient C++ engine for Dec-POMDPs designed to accelerate multi-agent reinforcement learning. HASE leverages Data-Oriented Design (DOD), explicit cache-line alignment, and a zero-copy PyTorch memory bridge to achieve high throughput. The engine reaches up to 33 million steps per second in a single-agent setting and demonstrates successful training of cooperative multi-agent policies using PPO, DQN, and SAC in significantly reduced timeframes.

Key Contribution

Training complex multi-agent RL policies just got 3,500x faster thanks to a new engine that optimizes for memory access and data locality.

Abstract

Reinforcement Learning (RL) algorithms exhibit high sample complexity, particularly when applied to Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). As a response, projects such as SampleFactory, EnvPool, Brax, and IsaacLab migrate parallel execution of classic environments such as MuJoCo and Atari into C++ thread pools or the GPU to decrease the computational cost of environment steps. We are interested in optimizing the decision-level of human-AI joint operations, so we introduce a compute-efficient Dec-POMDP engine natively architected in C++ called Hide-And-Seek-Engine. By employing Data-Oriented Design (DOD) principles, explicit 64-byte cache-line alignment to remove false sharing, and a zero-copy PyTorch memory bridge using pinned memory and Direct Memory Access (DMA), our engine sustains throughput of up to 33,000,000 steps per second (SPS) in a single-agent, 1024-environment, decentralized observations on an AMD Ryzen 9950X (16 cores). Ten agents reduces FPS to 7M SPS with generating random actions contributing 1/3rd the total runtime for reference. The engine achieves a throughput increase of approximately 3,500$\times$ over the baseline single threaded vectorized NumPy implementation and successfully trains cooperative multi-agent policies via PPO, DQN, and SAC in minutes, validating both its performance and generality.

Distributed Systems & Hardware Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations

Related Papers