Search papers, labs, and topics across Lattice.
This paper introduces a post-processing correction technique to preserve single-linkage clustering in lossy compressed particle data, addressing the critical need to maintain structural integrity for downstream scientific analysis. The method identifies vulnerable particle pairs using spatial partitioning and neighborhood search, then enforces clustering consistency via projected gradient descent. Experiments on cosmology and molecular dynamics datasets demonstrate effective cluster preservation with competitive compression ratios compared to existing compressors like SZ3 and Draco.
Even state-of-the-art lossy compression can destroy crucial clustering information in particle data, but this post-processing correction technique brings it back from the dead.
Lossy compression is widely used to reduce storage and I/O costs for large-scale particle datasets in scientific applications such as cosmology, molecular dynamics, and fluid dynamics, where clustering structures (e.g., single-linkage or Friends-of-Friends) are critical for downstream analysis; however, existing compressors typically provide only pointwise error bounds on particle positions and offer no guarantees on preserving clustering outcomes, and even small perturbations can alter cluster connectivity and compromise scientific validity. We propose a correction-based technique to preserve single-linkage clustering under lossy compression, operating on decompressed data from off-the-shelf compressors such as SZ3 and Draco. Our key contributions are threefold: (1) a clustering-aware correction algorithm that identifies vulnerable particle pairs via spatial partitioning and local neighborhood search; (2) an optimization-based formulation that enforces clustering consistency using projected gradient descent with a loss that encodes pairwise distance violations; and (3) a scalable GPU-accelerated and distributed implementation for large-scale datasets. Experiments on cosmology and molecular dynamics datasets show that our method effectively preserves clustering results while maintaining competitive compression performance compared with SZ3, ZFP, Draco, LCP, and space-filling-curve-based schemes.