Search papers, labs, and topics across Lattice.
This paper introduces a minor modification to Hartigan's k-means algorithm, a known improvement over Lloyd's algorithm for clustering. The proposed variant consistently achieves an additional 2%-5% improvement in clustering performance compared to the original Hartigan's method. The performance gain is shown to increase with higher dimensionality or a larger number of clusters (k).
A surprisingly simple tweak to Hartigan's k-means algorithm unlocks another 2-5% accuracy boost, especially when clustering high-dimensional data.
The k-means problem is perhaps the classical clustering problem and often synonymous with Lloyd's algorithm (1957). It has become clear that Hartigan's algorithm (1975) gives better results in almost all cases, Telgarsky-Vattani note a typical improvement of $5\%$ -- $10\%$. We point out that a very minor variation of Hartigan's method leads to another $2\%$ -- $5\%$ improvement; the improvement tends to become larger when either dimension or $k$ increase.