UWApr 23, 2026arXiv:2604.21798

An effective variant of the Hartigan $k$-means algorithm

AI Summary

This paper introduces a minor modification to Hartigan's k-means algorithm, a known improvement over Lloyd's algorithm for clustering. The proposed variant consistently achieves an additional 2%-5% improvement in clustering performance compared to the original Hartigan's method. The performance gain is shown to increase with higher dimensionality or a larger number of clusters (k).

Key Contribution

A surprisingly simple tweak to Hartigan's k-means algorithm unlocks another 2-5% accuracy boost, especially when clustering high-dimensional data.

Abstract

The k-means problem is perhaps the classical clustering problem and often synonymous with Lloyd's algorithm (1957). It has become clear that Hartigan's algorithm (1975) gives better results in almost all cases, Telgarsky-Vattani note a typical improvement of $5\%$ -- $10\%$. We point out that a very minor variation of Hartigan's method leads to another $2\%$ -- $5\%$ improvement; the improvement tends to become larger when either dimension or $k$ increase.

Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References21

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

An effective variant of the Hartigan $k$-means algorithm

Related Papers