Lancaster UniversityApr 16, 2026arXiv:2604.14810

Scalable Model-Based Clustering with Sequential Monte Carlo

C. Trojan, Connie Trojan, P. S Myshkov, Pavel Myshkov, P. Fearnhead, Paul Fearnhead, James Hensman, James Hensman, Tom Minka, Tom Minka, Christopher Nemeth, Chris Nemeth

AI Summary

This paper introduces a novel Sequential Monte Carlo (SMC) algorithm for scalable online clustering that decomposes the problem into approximately independent subproblems to reduce memory requirements. The method maintains a tractable representation of uncertainty over cluster assignments, even with complex cluster distributions like those found in text data. Experiments demonstrate the algorithm's accuracy and efficiency in knowledge base construction and other settings where traditional SMC methods fail due to memory constraints.

Key Contribution

Overcome the memory bottleneck of Sequential Monte Carlo clustering with a decomposition that enables scaling to large datasets without sacrificing accuracy.

Abstract

In online clustering problems, there is often a large amount of uncertainty over possible cluster assignments that cannot be resolved until more data are observed. This difficulty is compounded when clusters follow complex distributions, as is the case with text data. Sequential Monte Carlo (SMC) methods give a natural way of representing and updating this uncertainty over time, but have prohibitive memory requirements for large-scale problems. We propose a novel SMC algorithm that decomposes clustering problems into approximately independent subproblems, allowing a more compact representation of the algorithm state. Our approach is motivated by the knowledge base construction problem, and we show that our method is able to accurately and efficiently solve clustering problems in this setting and others where traditional SMC struggles.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Scalable Model-Based Clustering with Sequential Monte Carlo

Related Papers