Search papers, labs, and topics across Lattice.
This paper introduces ZK-Value, a practical zero-knowledge system for verifiable data valuation in data marketplaces using Shapley value attribution. To improve scalability, they propose LSH-Shapley, a locality-sensitive hashing-based valuation primitive, and ZK-LSH-Shapley, a tailored ZKP protocol that encodes collision counts into bucket-level histograms. Experiments on 12 datasets demonstrate that ZK-Value achieves comparable valuation quality to KNN-Shapley while significantly reducing proving time compared to existing ZK baselines.
Finally, a zero-knowledge data valuation system that scales: ZK-Value proves Shapley values in seconds to minutes, beating specialized ZK baselines by over an order of magnitude.
Data valuation is a foundational task in data marketplaces, where a Shapley-value attribution determines how a buyer's payment is distributed among data providers. Typically, the marketplace operator runs this attribution alone, requiring participants and external auditors to trust scores they cannot independently recompute on the underlying private data. While zero-knowledge proofs (ZKPs) can theoretically reconcile this conflict between privacy and verifiability, existing ZK valuation systems fail to scale to real-world marketplace demands due to prohibitive proving times or the requirement to disclose validation cohorts. We present ZK-Value, a practical, end-to-end ZK data-valuation system. Our solution bridges the scalability gap through a fully co-designed architecture: (1) LSH-Shapley, a locality-based valuation primitive that replaces expensive pairwise distance metrics with per-bucket collision counts; (2) ZK-LSH-Shapley, a tailored ZKP protocol that drastically reduces witness size by encoding these counts into bucket-level histograms rather than naive per-pair tensors; and (3) structural proof-system optimizations, specifically super-oracle batching and sparsity skipping. Evaluated across 12 standard datasets, ZK-Value delivers valuation quality on par with state-of-the-art baselines (within 0.033 AUROC of exact KNN-Shapley), while generating proofs in seconds to minutes and outperforming specialized ZK baselines by 12.6x to 68.1x in proving time, with verification in under 4.6 s.