Search papers, labs, and topics across Lattice.
This paper addresses the problem of differentially private frequent substring mining in a dataset of user-contributed strings. It introduces a new algorithm that achieves near-optimal error guarantees while significantly reducing space complexity to $O(n \ell+ |\Sigma| )$ and time complexity to $O(n \ell\log |\Sigma| + |\Sigma| )$. The algorithm employs a refined candidate-generation strategy and search space pruning based on frequency relations to avoid quadratic blow-ups of prior approaches.
Unlock scalable, privacy-preserving substring analysis with an algorithm that slashes space and time complexity while maintaining near-optimal accuracy.
Given a dataset of $n$ user-contributed strings, each of length at most $\ell$, a key problem is how to identify all frequent substrings while preserving each user's privacy. Recent work by Bernardini et al. (PODS'25) introduced a $\varepsilon$-differentially private algorithm achieving near-optimal error, but at the prohibitive cost of $O(n^2\ell^4)$ space and processing time. In this work, we present a new $\varepsilon$-differentially private algorithm that retains the same near-optimal error guarantees while reducing space complexity to $O(n \ell+ |\Sigma| )$ and time complexity to $O(n \ell\log |\Sigma| + |\Sigma| )$, for input alphabet $\Sigma$. Our approach builds on a top-down exploration of candidate substrings but introduces two new innovations: (i) a refined candidate-generation strategy that leverages the structural properties of frequent prefixes and suffixes, and (ii) pruning of the search space guided by frequency relations. These techniques eliminate the quadratic blow-ups inherent in prior work, enabling scalable frequent substring mining under differential privacy.