Search papers, labs, and topics across Lattice.
Strix is presented as a full-stack NPU reliability framework that addresses the increasing frequency of hardware faults in DNN/LLM accelerators. It achieves this by re-partitioning the NPU along the system inference pipeline, identifying dominant failure modes, and applying targeted safeguards. The framework demonstrates sub-microsecond fault localization, error detection, and correction with a 1.04x slowdown and minimal hardware overhead on an open-source SoC.
Fine-grained partitioning and targeted safeguards can provide robust NPU reliability with minimal performance overhead, challenging the assumption that redundancy is the only path forward.
DNNs and LLMs increasingly rely on hardware accelerators, including in safety-critical domains, while technology scaling and growing model complexity make hardware faults more frequent. Existing system-level mechanisms typically treat the NPU as a monolithic unit, using coarse-grained replication that incurs prohibitive performance and hardware overheads, leaving a gap between reliability requirements and deployable solutions. To bridge this gap, we present Strix, a full-stack NPU reliability framework on an open-source SoC, spanning micro-architecture, ISA, and programming methods. Strix re-partitions the NPU along the system inference pipeline, identifies dominant failure modes, and attaches targeted safeguards, achieving sub-micro-second fault localisation, error detection, and correction with only 1.04$\times$ slowdown and minimal hardware overhead.