A black-box fingerprinting system that identifies the underlying LLM model family (GPT, LLaMA, Mistral, etc.) by analysing response patterns across 31 carefully selected prompts. The system can identify fine-tuned models as well, tracing them back to their foundational base model.
Note: Check config.py to see all identifiable model families.
A pre-trained classifier is bundled with the package in the model/ directory.
Fingerprinting runs in three sequential layers:
-
31 prompts across 3 layers (discriminative → behavioral → stylistic):
- Discriminative (11): Identity, knowledge cutoff, architecture, reasoning — most separating power
- Behavioral (7): Safety boundaries, jailbreak resistance, honesty, policy handling
- Stylistic (13): Formatting, creativity, constraint following, default voice
-
Feature extraction per response: 384-dim sentence embeddings + 12 linguistic features + 6 behavioral features = 402 dims per layer, 1206 dims total
-
Embedding rebalancing: Per-layer PCA compresses 384-dim embeddings to 64 dims → 246-dim working space
-
Ensemble classification: Random Forest (45%) + SVM (45%) + MLP (10%)
-
Two-stage identification: Ensemble → model family, Template classifier → specific model version
-
Early stopping: After each layer the classifier checks confidence — if it exceeds the threshold (default 0.95) the remaining layers are skipped, saving API calls.
| Backend | Description | API Key Required |
|---|---|---|
ollama |
Local Ollama instance | ❌ No |
ollama-cloud |
Ollama Cloud API | ✅ OLLAMA_CLOUD_API_KEY |
openai |
OpenAI API (or compatible) | ✅ OPENAI_API_KEY |
gemini |
Gemini API | ✅ GEMINI_API_KEY |
custom |
Any HTTP-based LLM API | ✅ Optional |
The custom backend is the most flexible option — use it with:
- Proprietary LLM APIs not natively supported
- Self-hosted LLMs behind HTTP endpoints
- API proxies and gateways
- Any HTTP-based LLM service
All you need is an HTTP request template file. See examples in ./example/.
# Core package
pip install llm-fingerprinter
# With OpenAI support
pip install llm-fingerprinter[openai]
# With Gemini support
pip install llm-fingerprinter[gemini]
# With all backends
pip install llm-fingerprinter[all]# Local Ollama
llm-fingerprinter identify -b ollama --model llama3.2
# OpenAI
export OPENAI_API_KEY="your-key"
llm-fingerprinter identify -b openai --model gpt-4o-mini
# Custom endpoint
llm-fingerprinter identify -b custom -r ./custom_request.txt# Step 1: Generate training fingerprints for each family
# Temperature is automatically varied across simulations for diversity
llm-fingerprinter simulate -b ollama --model llama3.2 --family llama --num-sims 5
llm-fingerprinter simulate -b openai --model gpt-4o-mini --family gpt --num-sims 5
# Step 2: Train the ensemble classifier
llm-fingerprinter train
# Step 3: Build template classifiers (for two-stage identification)
llm-fingerprinter build-templates
llm-fingerprinter build-model-templates
# Step 4: Identify unknown models
llm-fingerprinter identify -b ollama --model some-unknown-modelCompute per-family mean vectors from training fingerprints for the open-set template classifier. Run after train.
llm-fingerprinter build-templatesThe template classifier uses cosine distance to nearest mean — it doesn't require retraining when adding new families.
Build templates at the specific model version level (e.g. gpt-4o-mini vs gpt-4.1) for two-stage identification.
llm-fingerprinter build-model-templatesRequires fingerprints that contain model_name in their metadata (all fingerprints generated with simulate on this version do).
Add a new model family to the template classifier from a few fingerprint samples, without retraining the full ensemble.
llm-fingerprinter add-family --model deepseek-chat --family deepseek --num-sims 3 -b deepseekRecommended minimum: 3 simulations for a reliable mean template.
| Variable | Backend | Description |
|---|---|---|
OLLAMA_CLOUD_API_KEY |
ollama-cloud | Ollama Cloud API key |
OPENAI_API_KEY |
openai | OpenAI API key |
GEMINI_API_KEY |
gemini | Gemini API key |
DEEPSEEK_API_KEY |
deepseek | DeepSeek API key |
LOG_LEVEL |
all | Logging level (DEBUG, INFO, WARNING) |
LLM_FINGERPRINTER_DATA |
all | Override data directory (fingerprints, model, logs) |
MIT License
