Search papers, labs, and topics across Lattice.
This paper analyzes the properties of different uncertainty aggregation strategies for image segmentation, revealing the limitations of the commonly used Global Average approach. They propose novel aggregation strategies that incorporate spatial uncertainty structure, demonstrating improved performance in out-of-distribution and failure detection tasks across ten diverse datasets. To address dataset-specific performance variations, they introduce a meta-aggregator that combines multiple strategies for robust performance across datasets.
Simply averaging pixel-level uncertainty in image segmentation throws away crucial spatial information, leading to worse performance on downstream tasks like detecting when your model is likely to fail.
Uncertainty Quantification (UQ) is crucial for ensuring the reliability of automated image segmentations in safety-critical domains like biomedical image analysis or autonomous driving. In segmentation, UQ generates pixel-wise uncertainty scores that must be aggregated into image-level scores for downstream tasks like Out-of-Distribution (OoD) or failure detection. Despite routine use of aggregation strategies, their properties and impact on downstream task performance have not yet been comprehensively studied. Global Average is the default choice, yet it does not account for spatial and structural features of segmentation uncertainty. Alternatives like patch-, class- and threshold-based strategies exist, but lack systematic comparison, leading to inconsistent reporting and unclear best practices. We address this gap by (1) formally analyzing properties, limitations, and pitfalls of common strategies; (2) proposing novel strategies that incorporate spatial uncertainty structure and (3) benchmarking their performance on OoD and failure detection across ten datasets that vary in image geometry and structure. We find that aggregators leveraging spatial structure yield stronger performance in both downstream tasks studied. However, the performance of individual aggregators depends heavily on dataset characteristics, so we (4) propose a meta-aggregator that integrates multiple aggregators and performs robustly across datasets.