Search papers, labs, and topics across Lattice.
This paper introduces a novel protocol for anonymously training gradient-boosted decision trees (GBDTs) on vertically partitioned data between two parties, addressing the privacy risks of revealing shared record IDs inherent in traditional private set intersection (PSI) approaches. The protocol employs dual circuit-PSI with oblivious programmable pseudorandom functions to propagate shared state, avoiding universal alignment and reducing the cost associated with ID hiding. Experimental results demonstrate competitive efficiency compared to non-anonymous methods, enabling privacy-preserving GBDT training for sensitive applications.
Training GBDTs on vertically partitioned data doesn't have to leak which records are shared: this new protocol hides record IDs while maintaining competitive efficiency.
Structured data is well handled by gradient-boosted decision trees (GBDT), which are usually trained on vertically partitioned features across mutually distrustful parties. High speed and interpretability make GBDTs popular in finance and healthcare, where neural networks may fall short. Enabling secure computation for GBDTs poses unique challenges, requiring secure record alignment for comparison. Relying on private set intersection (PSI) is a de facto approach. Mistaking PSI for a safety measure actually exposes which record identifiers (IDs) are shared between the datasets. Although circuit-PSI could help, it is costly for generic uses. New ideas are needed to efficiently train in a"dark forest". Aiming to hide the IDs, we initiate the study of anonymous GBDT training on split data held by two parties. Dual circuit-PSI in our design lets the parties alternate as receiver to run pick-then-sum over local features. Via oblivious programmable pseudorandom functions, we propagate circuit-PSI outputs as shared state across runs. Avoiding universal alignment, we resolve the neglected dilemma that ID hiding incurs a cost that scales with domain size. Next, we halve the cost of ciphertext packing used to convert single-instruction multiple-data homomorphic encryption from (ring) learning with errors in prior secure GBDT (Usenix Security'23) and related secure machine-learning computations. Comparative experiments show our protocol remains competitive with leaky approaches in efficiency. Enabling ID-hiding aggregation, our techniques can extend to other vertically partitioned analytics.