Mar 31, 2026arXiv:2603.29497

Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Models

Gabriel Loiseau, D. Sileo, Damien Riquet, Maxime Meyer, Marc Tommasi

AI Summary

This paper distills the privacy assessment capabilities of the 675B Mistral Large 3 model into smaller encoder models (as small as 150M parameters). They train these smaller models on a large dataset of privacy-annotated texts across 10 domains, significantly reducing computational costs while maintaining strong agreement with human privacy judgments. The distilled models are validated on human-annotated test data and shown to be useful for evaluating de-identification systems.

Key Contribution

You can shrink a privacy expert LLM by 4500x and still get human-level privacy judgments.

Abstract

Accurate privacy evaluation of textual data remains a critical challenge in privacy-preserving natural language processing. Recent work has shown that large language models (LLMs) can serve as reliable privacy evaluators, achieving strong agreement with human judgments; however, their computational cost and impracticality for processing sensitive data at scale limit real-world deployment. We address this gap by distilling the privacy assessment capabilities of Mistral Large 3 (675B) into lightweight encoder models with as few as 150M parameters. Leveraging a large-scale dataset of privacy-annotated texts spanning 10 diverse domains, we train efficient classifiers that preserve strong agreement with human annotations while dramatically reducing computational requirements. We validate our approach on human-annotated test data and demonstrate its practical utility as an evaluation metric for de-identification systems.

Constitutional AI & AI Ethics Inference & Quantization Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Models

Related Papers