Skip to content

Puneeth0106/NLP-log-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Log Classification System

A multi-strategy log classification pipeline exposed via a FastAPI REST endpoint. It routes each log message through three cascading classifiers — Regex, BERT, and LLM — depending on the log source and pattern match confidence.


Workflow

Workflow Diagram


How It Works

Each row in the uploaded CSV has two fields: source and log_message. Classification is determined as follows:

Step Condition Classifier Used Labels Produced
1 source == LegacyCRM LLM (GPT-4o) Workflow Error, Deprecation Warning
2 All other sources — pattern match found Regex User Action, System Notification
3 All other sources — no regex match BERT (SentenceTransformer) Multi-class or Unclassified

Project Structure

Log Classification System/
├── main.py                  # FastAPI app — POST /classify endpoint
├── classify.py              # Orchestration logic (routing between classifiers)
├── src/
│   ├── regex_classifier.py  # Rule-based pattern matching
│   ├── bert_classifier.py   # SentenceTransformer + Sklearn model
│   └── llm_classifier.py    # GPT-4o via LangGraph workflow
├── models/
│   └── log_classifier.joblib  # Pre-trained BERT classification model
├── training/
│   └── training.ipynb       # Model training notebook
├── test/
│   ├── test.csv             # Sample input
│   └── test_results.csv     # Sample output
└── assets/
    └── workflow.png         # Architecture diagram

API

POST /classify

Upload a CSV file and receive a classified CSV in response.

Request

curl -X POST http://localhost:8000/classify \
     -F "file=@test/test.csv"

Input CSV format

source,log_message
LegacyCRM,"Case escalation for ticket ID 7324 failed..."
ServerA,"Backup completed successfully."
ServerB,"GET /v2/servers/detail HTTP/1.1 404"

Output CSV — same file with an added target_label column.


Setup

# Install dependencies (uses uv)
uv sync

# Add your OpenAI key for the LLM classifier
echo "OPENAI_API_KEY=sk-..." > .env

# Start the server
python main.py

Server runs at http://localhost:8000.


Classifiers

Regex Classifier (src/regex_classifier.py)

Matches log messages against a set of predefined regex patterns. Fast and deterministic. Returns None if no pattern matches, triggering BERT fallback.

BERT Classifier (src/bert_classifier.py)

Encodes the log message using all-MiniLM-L6-v2 (SentenceTransformer) and classifies with a trained Sklearn model. Returns Unclassified if max probability is below 0.5.

LLM Classifier (src/llm_classifier.py)

Uses GPT-4o with structured output via a LangGraph single-node workflow. Exclusively handles LegacyCRM logs which require semantic understanding.


Dependencies

Package Purpose
fastapi / uvicorn REST API server
sentence-transformers Log message embeddings
scikit-learn BERT classification model
langchain-openai / langgraph LLM-based classification
pandas CSV I/O

About

Multi-strategy log classification pipeline using Regex, BERT, and GPT-4o - exposed via a FastAPI REST endpoint.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors