Skip to content

Carrtik/mlflow-audit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mlflow-audit

Static analysis tool to detect missing MLFLOW_ALLOW_PICKLE_DESERIALIZATION guards in Python codebases using MLflow's pickle deserialization.

Background

MLflow uses MLFLOW_ALLOW_PICKLE_DESERIALIZATION as a security control to block unsafe pickle deserialization by default. Every pickle load in the codebase is supposed to check this guard before proceeding.

This tool was built after discovering that MLflow's LangChain integration was missing this guard in _load_from_pickle() — reported via GitHub private security advisory GHSA-cxjq-35gw-4m9f.

The finding: while sklearn, tensorflow, pytorch, pmdarima, dspy, and evaluation artifacts all check the guard — the LangChain integration did not. An attacker could craft a malicious LangChain model and achieve RCE on anyone who loads it, bypassing the security control entirely.

This tool automates the audit so you can check your own MLflow-based codebase for the same class of issue.

Installation

pip install -e .

Usage

# Scan a directory
mlflow-audit ./your-mlflow-project

# Scan MLflow source itself
mlflow-audit ./mlflow

# Show guarded loads too
mlflow-audit ./mlflow --show-guarded

Example Output

[*] Scanning 847 Python files in ./mlflow...

mlflow-audit: Pickle Deserialization Guard Scanner Total pickle calls found : 14 Unguarded (HIGH) : 1 Guarded (OK) : 13

[!] UNGUARDED PICKLE LOADS — REVIEW REQUIRED [HIGH] ✗ UNGUARDED File: mlflow/langchain/utils/logging.py:452 Call: cloudpickle.load Context: 449 | def _load_from_pickle(path): 450 | with open(path, "rb") as f:

452 | return cloudpickle.load(f) 453 |

What It Detects

Pickle and cloudpickle load calls that are missing any of:

  • MLFLOW_ALLOW_PICKLE_DESERIALIZATION
  • allow_pickle
  • weights_only

within a 10-line context window.

What It Does Not Do

This is not a malware scanner. It does not inspect pickle file contents for malicious payloads — tools like PickleScan do that.

This tool specifically audits whether your MLflow integration code consistently applies the security guard that MLflow provides.

Vulnerability Reference

  • Advisory: GHSA-cxjq-35gw-4m9f
  • CWE: CWE-502 Deserialization of Untrusted Data
  • Severity: High
  • Reporter: Kartik Nair

Author

Kartik Nair — github.com/Carrtikmedium.com/@contact.kartikn

About

Static analysis tool to detect missing pickle deserialization guards in MLflow codebases

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages