Feb 23, 2026arXiv:2602.19818

SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models

Hillel Ohayon, Hillel Ohayon, Daniel Gilkarov, Daniel Gilkarov, Ran Dubin, Ran Dubin

AI Summary

This paper introduces SafePickle, a machine learning-based scanner for detecting malicious Python pickle files used to serialize ML models, addressing the remote code execution (RCE) risks associated with loading models from repositories like Hugging Face. SafePickle statically extracts structural and semantic features from Pickle bytecode and uses supervised and unsupervised learning to classify files. Experiments on a new labeled dataset, out-of-distribution data, and evasive malware show SafePickle achieves significantly higher F1-scores and robustness compared to existing scanners, demonstrating its effectiveness in mitigating Pickle-based model file attacks.

Key Contribution

ML-powered detection can generically and robustly identify malicious pickle files, outperforming existing signature-based methods and even evading advanced adversarial attacks.

Abstract

Model repositories such as Hugging Face increasingly distribute machine learning artifacts serialized with Python's pickle format, exposing users to remote code execution (RCE) risks during model loading. Recent defenses, such as PickleBall, rely on per-library policy synthesis that requires complex system setups and verified benign models, which limits scalability and generalization. In this work, we propose a lightweight, machine-learning-based scanner that detects malicious Pickle-based files without policy generation or code instrumentation. Our approach statically extracts structural and semantic features from Pickle bytecode and applies supervised and unsupervised models to classify files as benign or malicious. We construct and release a labeled dataset of 727 Pickle-based files from Hugging Face and evaluate our models on four datasets: our own, PickleBall (out-of-distribution), Hide-and-Seek (9 advanced evasive malicious models), and synthetic joblib files. Our method achieves 90.01% F1-score compared with 7.23%-62.75% achieved by the SOTA scanners (Modelscan, Fickling, ClamAV, VirusTotal) on our dataset. Furthermore, on the PickleBall data (OOD), it achieves 81.22% F1-score compared with 76.09% achieved by the PickleBall method, while remaining fully library-agnostic. Finally, we show that our method is the only one to correctly parse and classify 9/9 evasive Hide-and-Seek malicious models specially crafted to evade scanners. This demonstrates that data-driven detection can effectively and generically mitigate Pickle-based model file attacks.

Open-Source Models & Weights Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References36

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models

Related Papers