📡 SpamShield — Machine Learning Spam Detection API

SpamShield is a lightweight, production-style machine-learning system for classifying SMS text messages as spam or ham (not spam). It includes a Python-based model training pipeline, a FastAPI prediction service, and a secure, signed REST API for remote inference.

🧠 Overview

SpamShield combines classical ML techniques with modern deployment practices to demonstrate an end-to-end machine learning lifecycle:

Model training uses the UCI SMS Spam Collection Dataset. Texts are vectorized using TfidfVectorizer and classified using a tuned LogisticRegression model.
Data management is automated via a KaggleHub download and preprocessing utility.
Evaluation metrics include F1 score, Precision, Recall, and average precision (AUC-PR).
Model packaging exports reproducible .joblib files with integrity verification (SHA-256 hashes and metadata).
API service exposes /predict, /health, /ready, and /metrics endpoints under FastAPI, with request-level Prometheus metrics and optional HMAC authentication.

📊 Model Performance

Precision–Recall Curve

Confusion Matrix

🔐 API Authentication

All incoming requests can be optionally validated using HMAC signatures. Each client signs its requests as:

signature = hmac.new(
    secret.encode("utf-8"),
    f"{method}\n{path}\n{timestamp}\n{sha256(body)}\n{api_key}".encode("utf-8"),
    hashlib.sha256
).hexdigest()

The server recomputes this signature to verify authenticity and ensure the payload was not tampered with.

Note: This approach is designed for machine-to-machine integrity verification. For user-level authentication, API key management, or OAuth2-style login flows, consider integrating a provider or framework such as:

FastAPI Users for JWT-based user registration and authentication

Auth0, AWS Cognito, or Supabase Auth for managed identity and token-based authorization

Combining HMAC signing with authenticated API keys for hybrid setups where both integrity and identity are important

🧩 Metrics & Observability

Prometheus metrics exposed at /metrics:
- api_requests_total — per-route request counts
- request_latency_seconds — end-to-end latency histogram
- model_inference_seconds — inference-only timing
- request_payload_bytes — incoming payload size distribution
JSON-structured logging with request IDs for traceability

🚀 Deployment

SpamShield is containerized and can be deployed on AWS ECS or any environment supporting Docker.

Local deployment

Requirements:

Python 3.14+
uv (Astral’s Python package manager)
Docker (for containerized runtime)

Ensure that you have a trained, versioned model available in the runtime models/ directory.

docker build --build-arg SPAMSHIELD_MODEL_VERSION=v1.0.0 -t spamshield:v1.0.0 .
docker run -p 8080:8080 spamshield

⚠️ Current Limitations & Future Work

While designed to mimic production environments, SpamShield is intentionally simple and has a few limitations worth improving:

Area	Limitation	Potential Improvement
Modeling	Classical logistic regression only. No contextual NLP	Experiment with transformer-based embeddings (e.g., `distilbert-base-uncased`)
Dataset	Limited to small SMS dataset	Add multilingual datasets and larger email/text messages
Thresholding	Static threshold stored in metadata	Implement dynamic calibration or per-user thresholds
Authentication	HMAC keys stored as plain environment vars	Integrate AWS Secrets Manager / KMS rotation
Scalability	Single model instance in memory	Add model caching & autoscaling with ECS target tracking
Monitoring	Basic Prometheus histograms only	Include inference-level metrics and model drift detection

🧪 Quick Start (Local)

Create the dataset:

uv run create-spam-dataset --output data/spam.csv

Train the model:

uv run train-spam-model --dataset data/spam.csv --version v1.0.0 --plots

Move the model into the API runtime model directory:
```
mv models/v1.0.0 src/spamshield/api/models
```
Update the .env.dev to use the correct model version:
```
SPAMSHIELD_MODEL_VERSION="v1.0.0"
```

Run the API:

uv run fastapi dev src/spamshield/api/main

Send a prediction request:

uv run scripts/request.py -u http://localhost:8000 -m "Click here for free cash!"

Example Response:

{ "label": "spam", "score": 0.9823 }

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
scripts		scripts
src/spamshield		src/spamshield
tests		tests
.dockerignore		.dockerignore
.env.dev		.env.dev
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📡 SpamShield — Machine Learning Spam Detection API

🧠 Overview

📊 Model Performance

Precision–Recall Curve

Confusion Matrix

🔐 API Authentication

🧩 Metrics & Observability

🚀 Deployment

Local deployment

⚠️ Current Limitations & Future Work

🧪 Quick Start (Local)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📡 SpamShield — Machine Learning Spam Detection API

🧠 Overview

📊 Model Performance

Precision–Recall Curve

Confusion Matrix

🔐 API Authentication

🧩 Metrics & Observability

🚀 Deployment

Local deployment

⚠️ Current Limitations & Future Work

🧪 Quick Start (Local)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages