Search papers, labs, and topics across Lattice.
This paper analyzes the transient-fault sensitivity of the Spatz RISC-V vector cluster using both SET and SEU fault injection models across various MatMul configurations. The study reveals that faulty data corruption (FD) is the most prevalent error manifestation, with the vector execution path and TCDM modules being particularly sensitive to SET faults. Furthermore, the research quantifies the severity of silent data corruption (SDC) across different floating-point formats, highlighting the vulnerability of exponents and the relative robustness of FP8.
Exponent bits are the Achilles' heel of floating-point arithmetic, as corrupting them in RISC-V vector processors leads to the most severe silent data corruption.
We present a transient-fault sensitivity study of the open-source RISC-V vector cluster Spatz under SET and SEU fault models. Across 100,000 fault injections on six MatMul and Widening MatMul configurations, faulty data corruption (FD) is the dominant manifesting outcome for all evaluated workloads, accounting for at least 86% of manifesting errors in the SET campaigns and at least 91% in the SEU campaigns. At the module level, SET sensitivity is concentrated in the vector execution path, while TCDM is the major contributor to FD manifestations. We further quantify SDC severity across FP32, FP16, BP16, and FP8 by analyzing both the average number of corrupted outputs and their RMSE. FP8 shows the lowest output impact overall, while FP16 Widening MatMul reduces both corruption spread and RMSE compared with FP16 MatMul. By contrast, the effect of widening on FP8 is limited in our experiments. Finally, exponent-targeted corruptions induce the most severe SDC events, with the largest deviations observed in FP32 and BP16, motivating selective protection of the highest-impact datapaths and fault cases.