Search papers, labs, and topics across Lattice.
This paper addresses the problem of continuously auditing machine learning systems across multiple data streams to detect unusual behavior using sequential hypothesis testing. The authors construct new sequential tests by merging test martingales, creating trade-offs in expected stopping times under sparse or dense alternative hypotheses. They derive a balanced test that achieves an improved expected stopping time bound of $O(\frac{1}{k}\ln\frac{1}α)$ under dense alternatives, outperforming Bonferroni correction in such scenarios.
Forget Bonferroni: a new sequential testing approach slashes audit times for multi-stream ML systems, especially when anomalies are widespread.
Across many risk-sensitive areas, it is critical to continuously audit the performance of machine learning systems and detect any unusual behavior quickly. This can be modeled as a sequential hypothesis testing problem with $k$ incoming streams of data and a global null hypothesis that asserts that the system is working as expected across all $k$ streams. The standard global test employs a Bonferroni correction and has an expected stopping time bound of $O\left(\ln\frac{k}α\right)$ when $k$ is large and the significance level of the test, $α$, is small. In this work, we construct new sequential tests by using ideas of merging test martingales with different trade-offs in expected stopping times under different, sparse or dense alternative hypotheses. We further derive a new, balanced test that achieves an improved expected stopping time bound that matches Bonferroni's in the sparse setting but that naturally results in $O\left(\frac{1}{k}\ln\frac{1}α\right)$ under a dense alternative. We empirically demonstrate the effectiveness of our proposed tests on synthetic and real-world data.