Search papers, labs, and topics across Lattice.
This paper introduces a provenance-aware vulnerability analysis approach for Python applications that considers both vendored libraries and OS package versions to reduce false positives and negatives in vulnerability detection. The method uses content-based hashing to resolve vendored libraries to specific OS package versions and dynamic analysis to extract version information from binaries. By constructing cross-ecosystem call graphs, the approach enables reachability analysis of vulnerable functions, identifying both directly and indirectly vulnerable packages.
Current Python vulnerability scanners miss millions of vulnerable downloads by failing to account for vendored dependencies and OS-level security patches.
Python applications depend on native libraries that may be vendored within package distributions or installed on the host system. When vulnerabilities are discovered in these libraries, determining which Python packages are affected requires cross-ecosystem analysis spanning Python dependency graphs and OS package versions. Current vulnerability scanners produce false negatives by missing vendored vulnerabilities and false positives by ignoring security patches backported by OS distributions. We present a provenance-aware vulnerability analysis approach that resolves vendored libraries to specific OS package versions or upstream releases. Our approach queries vendored libraries against a database of historical OS package artifacts using content-based hashing, and applies library-specific dynamic analyses to extract version information from binaries built from upstream source. We then construct cross-ecosystem call graphs by stitching together Python and binary call graphs across dependency boundaries, enabling reachability analysis of vulnerable functions. Evaluating on 100,000 Python packages and 10 known CVEs associated with third-party native dependencies, we identify 39 directly vulnerable packages (47M+ monthly downloads) and 312 indirectly vulnerable client packages affected through dependency chains. Our analysis achieves up to 97% false positive reduction compared to upstream version matching.