Search papers, labs, and topics across Lattice.
The paper addresses the challenge of attributing a computer system's overall performance to individual components like the CPU, which is difficult due to their interdependence. It demonstrates that standard methodologies like Design of Experiments (DoE) and Randomized Controlled Trials (RCTs) are inefficient for this task, and that even industry benchmarks like SPEC CPU2017 suffer from uncontrolled variability due to undefined configurations of other components. The authors propose and validate a new methodology that enables cost-efficient attribution of system-level effects to specific components through controlled experiments and theoretical analysis.
Industry-standard CPU benchmarks can vary by up to 436% due to uncontrolled configurations of other system components, highlighting a critical flaw in current evaluation practices.
In a computer system, multiple indispensable components-such as the CPU, memory, and others-work together with other essential components to produce an overall effect, which can only be measured on an independently running system. Since the system operates as an integrated whole, isolating the effect of individual components is challenging. Accurately attributing the system's overall effect to its specific component is crucial for both computer design and evaluation. Taking CPU evaluation as a benchmark, our experiments reveal that the general-purpose rigorous methodologies, like DoE, RCTs, can not address this issue efficiently; A single-purpose empirical methodology, SPEC CPU2017, which is the industry-standard CPU benchmark, only reports the overall effect. Even more concerningly, for the identical CPU, the undefined configurations of other indispensable components introduce uncontrolled variability, with the SPEC scores fluctuating from 12.16\% to 436.80\%. We propose a rigorous methodology that can attribute the overall effect to its specific component, which can be utilized in computer component evaluations and design, as well as in other areas. Through theoretical analysis and pioneering controlled experiments, we systematically compare our methodology against three established methodologies: SPEC CPU2017, DoE, and RCTs. The results show our methodology can achieve its goal in a cost-efficient way, while others exhibit inherent limitations.