Adelaide UniversityAustralian Institute for MachineCenterInformation and TechnologyMunich Center for Machine LearningMunich Data Science InstituteTU MunichApr 13, 2026arXiv:2604.11416

Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning

Ajinkya Mohgaonkar, Lukas Gosch, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar, Stephan Günnemann

AI Summary

This paper introduces EnsembleCert, a white-box certification framework for partition-aggregation ensembles that defends against label-flipping attacks. By leveraging white-box knowledge of base classifiers and the equivalence between wide neural networks and kernel methods (via ScaLabelCert), EnsembleCert computes tighter, polynomial-time ensemble-level robustness guarantees compared to black-box approaches. Experiments on CIFAR-10 demonstrate that EnsembleCert certifies up to 26.5% more label flips than existing black-box methods while using significantly fewer partitions.

Key Contribution

Challenging the conventional wisdom that strong certified robustness requires heavy partitioning, this work shows how white-box knowledge of base classifiers in partition-aggregation ensembles can yield significantly tighter robustness guarantees against label-flipping attacks with fewer partitions.

Abstract

Label-flipping attacks, which corrupt training labels to induce misclassifications at inference, remain a major threat to supervised learning models. This drives the need for robustness certificates that provide formal guarantees about a model's robustness under adversarially corrupted labels. Existing certification frameworks rely on ensemble techniques such as smoothing or partition-aggregation, but treat the corresponding base classifiers as black boxes, yielding overly conservative guarantees. We introduce EnsembleCert, the first certification framework for partition-aggregation ensembles that utilizes white-box knowledge of the base classifiers. Concretely, EnsembleCert yields tighter guarantees than black-box approaches by aggregating per-partition white-box certificates to compute ensemble-level guarantees in polynomial time. To extract white-box knowledge from the base classifiers efficiently, we develop ScaLabelCert, a method that leverages the equivalence between sufficiently wide neural networks and kernel methods using the neural tangent kernel. ScaLabelCert yields the first exact, polynomial-time calculable certificate for neural networks against label-flipping attacks. EnsembleCert is either on par, or significantly outperforms the existing partition-based black box certificates. Exemplary, on CIFAR-10, our method can certify upto +26.5% more label flips in median over the test set compared to the existing black-box approach while requiring 100 times fewer partitions, thus, challenging the prevailing notion that heavy partitioning is a necessity for strong certified robustness.

Data Curation & Synthetic Data Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning

Related Papers