A multi-strategy log classification pipeline exposed via a FastAPI REST endpoint. It routes each log message through three cascading classifiers — Regex, BERT, and LLM — depending on the log source and pattern match confidence.
Each row in the uploaded CSV has two fields: source and log_message. Classification is determined as follows:
| Step | Condition | Classifier Used | Labels Produced |
|---|---|---|---|
| 1 | source == LegacyCRM |
LLM (GPT-4o) | Workflow Error, Deprecation Warning |
| 2 | All other sources — pattern match found | Regex | User Action, System Notification |
| 3 | All other sources — no regex match | BERT (SentenceTransformer) | Multi-class or Unclassified |
Log Classification System/
├── main.py # FastAPI app — POST /classify endpoint
├── classify.py # Orchestration logic (routing between classifiers)
├── src/
│ ├── regex_classifier.py # Rule-based pattern matching
│ ├── bert_classifier.py # SentenceTransformer + Sklearn model
│ └── llm_classifier.py # GPT-4o via LangGraph workflow
├── models/
│ └── log_classifier.joblib # Pre-trained BERT classification model
├── training/
│ └── training.ipynb # Model training notebook
├── test/
│ ├── test.csv # Sample input
│ └── test_results.csv # Sample output
└── assets/
└── workflow.png # Architecture diagram
Upload a CSV file and receive a classified CSV in response.
Request
curl -X POST http://localhost:8000/classify \
-F "file=@test/test.csv"Input CSV format
source,log_message
LegacyCRM,"Case escalation for ticket ID 7324 failed..."
ServerA,"Backup completed successfully."
ServerB,"GET /v2/servers/detail HTTP/1.1 404"
Output CSV — same file with an added target_label column.
# Install dependencies (uses uv)
uv sync
# Add your OpenAI key for the LLM classifier
echo "OPENAI_API_KEY=sk-..." > .env
# Start the server
python main.pyServer runs at http://localhost:8000.
Matches log messages against a set of predefined regex patterns. Fast and deterministic. Returns None if no pattern matches, triggering BERT fallback.
Encodes the log message using all-MiniLM-L6-v2 (SentenceTransformer) and classifies with a trained Sklearn model. Returns Unclassified if max probability is below 0.5.
Uses GPT-4o with structured output via a LangGraph single-node workflow. Exclusively handles LegacyCRM logs which require semantic understanding.
| Package | Purpose |
|---|---|
fastapi / uvicorn |
REST API server |
sentence-transformers |
Log message embeddings |
scikit-learn |
BERT classification model |
langchain-openai / langgraph |
LLM-based classification |
pandas |
CSV I/O |
