Search papers, labs, and topics across Lattice.
The paper introduces GridPilot, a three-tier predictive controller designed to enable real-time grid-responsive control for AI supercomputers. It addresses the challenge of aligning data center electricity demand with grid flexibility requirements by translating grid requests into rapid GPU power adjustments. The system achieves a 97.2 ms trigger-to-target response time on a three-GPU testbed, significantly faster than Nordic Fast Frequency Reserve requirements, and incorporates PUE correction for robust meter-level commitments.
AI supercomputers can react to grid demands faster than you think: GridPilot achieves sub-100ms response times, opening the door for AI to stabilize power grids.
At global scale, data-center electricity demand is growing faster than the grids that supply it, while system operators increasingly require large flexible loads that can adjust power within seconds to absorb variable wind and solar generation. For multi-megawatt AI/HPC facilities, the key unresolved question is practical and measurable: how quickly can the software stack translate a grid request into a real change in GPU power at the facility meter, where commitments are settled? We answer this on real hardware with GridPilot, a three-tier predictive controller operating across milliseconds, seconds, and hours, augmented by a deterministic safety-island bypass for fast response. On a three-GPU NVIDIA V100 testbed, GridPilot achieves a measured end-to-end trigger-to-target response of 97.2 ms, which is 6.9x faster than the 700 ms requirement of Nordic Fast Frequency Reserve. We further incorporate an instantaneous Power Usage Effectiveness (PUE) correction so dispatched commitments remain robust at meter level rather than only at IT load level. In replay experiments across six representative European grids (from Sweden to Poland), the PUE-aware controller closes 2.5-5.8 percentage points of cooling-overhead drag. GridPilot is released as open source and serves as a proof of concept that MW-scale AI/HPC demand can be engineered as controllable, grid-responsive flexibility by design.