Search papers, labs, and topics across Lattice.
The paper introduces Proxics, a programming model for Near-Data Processing (NDP) accelerators that leverages familiar OS abstractions like virtual processors and IPC channels. It addresses the limitations of naive implementations on NDP hardware by implementing lightweight processes and optimized communication channels. Experiments on a real hardware platform demonstrate performance benefits over CPU-only implementations and highlight the importance of low-latency CPU-NDP communication.
Forget heavyweight processes and bandwidth bottlenecks: Proxics offers a lightweight programming model that unlocks the potential of near-data processing with efficient virtual processors and optimized communication channels.
The use of disaggregated or far memory systems such as CXL memory pools has renewed interest in Near-Data Processing (NDP): situating cores close to memory to reduce bandwidth requirements to and from the CPU. Hardware designs for such accelerators are appearing, but there lack clean, portable OS abstractions for programming them. We propose a programming model for NDP devices based on familiar OS abstractions: virtual processors (processes) and inter-process communication channels (like Unix pipes). While appealing from a user perspective, a naive implementation of such abstractions is inappropriate for NDP accelerators: the paucity of processing power in some hardware designs makes classical processes overly heavyweight, and IPC based on shared buffers makes no sense in a system designed to reduce memory bandwidth. Accordingly, we show how to implement these abstractions in a lightweight and efficient manner by exploiting compilation and interconnect protocols. We demonstrate them with a real hardware platform runing applications with a range of memory access patterns, including bulk memory operations, in-memory databases and graph applications. Crucially, we show not only the benefits over CPU-only implementations, but also the critical importance of efficient, low-latency communication channels between CPU and NDP accelerators, a feature largely neglected in existing proposals.