Feb 16, 2026arXiv:2602.14855

A Pragmatic Method for Comparing Clusterings with Overlaps and Outliers

Ryan DeWolfe, Paweł Prałat, François Théberge

AI Summary

This paper introduces a novel similarity measure for comparing clusterings that explicitly handles overlaps and outliers, addressing a gap in extrinsic clustering evaluation. The proposed measure is shown to possess desirable properties and avoids biases common in existing clustering comparison metrics. Empirical validation demonstrates the measure's robustness and utility in scenarios with overlapping clusters and outliers.

Key Contribution

A new clustering similarity measure robust to overlaps and outliers finally lets you rigorously benchmark algorithms on real-world data.

Abstract

Clustering algorithms are an essential part of the unsupervised data science ecosystem, and extrinsic evaluation of clustering algorithms requires a method for comparing the detected clustering to a ground truth clustering. In a general setting, the detected and ground truth clusterings may have outliers (objects belonging to no cluster), overlapping clusters (objects may belong to more than one cluster), or both, but methods for comparing these clusterings are currently undeveloped. In this note, we define a pragmatic similarity measure for comparing clusterings with overlaps and outliers, show that it has several desirable properties, and experimentally confirm that it is not subject to several common biases afflicting other clustering comparison measures.

Eval Frameworks & Benchmarks

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A Pragmatic Method for Comparing Clusterings with Overlaps and Outliers

Related Papers