Log Classification System

A multi-strategy log classification pipeline exposed via a FastAPI REST endpoint. It routes each log message through three cascading classifiers — Regex, BERT, and LLM — depending on the log source and pattern match confidence.

Workflow

How It Works

Each row in the uploaded CSV has two fields: source and log_message. Classification is determined as follows:

Step	Condition	Classifier Used	Labels Produced
1	`source == LegacyCRM`	LLM (GPT-4o)	`Workflow Error`, `Deprecation Warning`
2	All other sources — pattern match found	Regex	`User Action`, `System Notification`
3	All other sources — no regex match	BERT (SentenceTransformer)	Multi-class or `Unclassified`

Project Structure

Log Classification System/
├── main.py                  # FastAPI app — POST /classify endpoint
├── classify.py              # Orchestration logic (routing between classifiers)
├── src/
│   ├── regex_classifier.py  # Rule-based pattern matching
│   ├── bert_classifier.py   # SentenceTransformer + Sklearn model
│   └── llm_classifier.py    # GPT-4o via LangGraph workflow
├── models/
│   └── log_classifier.joblib  # Pre-trained BERT classification model
├── training/
│   └── training.ipynb       # Model training notebook
├── test/
│   ├── test.csv             # Sample input
│   └── test_results.csv     # Sample output
└── assets/
    └── workflow.png         # Architecture diagram

API

`POST /classify`

Upload a CSV file and receive a classified CSV in response.

Request

curl -X POST http://localhost:8000/classify \
     -F "file=@test/test.csv"

Input CSV format

source,log_message
LegacyCRM,"Case escalation for ticket ID 7324 failed..."
ServerA,"Backup completed successfully."
ServerB,"GET /v2/servers/detail HTTP/1.1 404"

Output CSV — same file with an added target_label column.

Setup

# Install dependencies (uses uv)
uv sync

# Add your OpenAI key for the LLM classifier
echo "OPENAI_API_KEY=sk-..." > .env

# Start the server
python main.py

Server runs at http://localhost:8000.

Classifiers

Regex Classifier (`src/regex_classifier.py`)

Matches log messages against a set of predefined regex patterns. Fast and deterministic. Returns None if no pattern matches, triggering BERT fallback.

BERT Classifier (`src/bert_classifier.py`)

Encodes the log message using all-MiniLM-L6-v2 (SentenceTransformer) and classifies with a trained Sklearn model. Returns Unclassified if max probability is below 0.5.

LLM Classifier (`src/llm_classifier.py`)

Uses GPT-4o with structured output via a LangGraph single-node workflow. Exclusively handles LegacyCRM logs which require semantic understanding.

Dependencies

Package	Purpose
`fastapi` / `uvicorn`	REST API server
`sentence-transformers`	Log message embeddings
`scikit-learn`	BERT classification model
`langchain-openai` / `langgraph`	LLM-based classification
`pandas`	CSV I/O

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Log Classification System

Workflow

How It Works

Project Structure

API

`POST /classify`

Setup

Classifiers

Regex Classifier (`src/regex_classifier.py`)

BERT Classifier (`src/bert_classifier.py`)

LLM Classifier (`src/llm_classifier.py`)

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
models		models
src		src
test		test
training		training
.gitignore		.gitignore
README.md		README.md
classify.py		classify.py
main.py		main.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Log Classification System

Workflow

How It Works

Project Structure

API

POST /classify

Setup

Classifiers

Regex Classifier (src/regex_classifier.py)

BERT Classifier (src/bert_classifier.py)

LLM Classifier (src/llm_classifier.py)

Dependencies

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /classify`

Regex Classifier (`src/regex_classifier.py`)

BERT Classifier (`src/bert_classifier.py`)

LLM Classifier (`src/llm_classifier.py`)

Packages