π§ Log Classification with Hybrid NLP Framework
This project implements an intelligent Hybrid Log Classification System combining Regex, Machine Learning (Sentence Transformer + Logistic Regression), and LLM-based reasoning to classify log messages into meaningful categories.
Itβs a complete end-to-end production system, featuring separate FastAPI backend and Streamlit frontendβpackaged via Docker and deployable on Render.
ποΈ Project Architecture Log_Classification_Project/ β βββ backend/ β βββ api/ β β βββ server.py # FastAPI backend API β βββ src/ β β βββ classify.py # Core hybrid classification pipeline β β βββ processor_regex.py # Regex-based classification logic β β βββ preprocess.py # Text cleaning and preprocessing β βββ models/ β β βββ log_model.pkl # Logistic Regression model β β βββ sentence_transformer/ # Sentence Transformer model β βββ requirements.txt β βββ Dockerfile β βββ start.sh # Backend startup script β βββ frontend/ β βββ dashboard/ β β βββ st_app.py # Streamlit dashboard for visualization β βββ requirements.txt β βββ Dockerfile β βββ start.sh # Frontend startup script β βββ resources/ β βββ example_logs.csv β βββ architecture.png β βββ README.md
βοΈ Hybrid Classification Workflow
The hybrid classifier follows a three-tiered logic flow:
π§© Regex Layer Quickly detects well-known or patterned log formats (e.g. ERROR 404, Timeout, Disk full).
π ML Layer (Sentence Transformer + Logistic Regression) Encodes log messages and predicts labels using a trained model for structured log data.
π§ LLM Fallback Layer Invoked when the above two models return uncertain predictions or ambiguous messages.
π Local Setup 1οΈβ£ Clone the Repository git clone https://github.com//log-classification.git cd log-classification
2οΈβ£ Backend Setup cd backend python -m venv logenv logenv\Scripts\activate # (Windows) pip install -r requirements.txt uvicorn api.server:app --host 0.0.0.0 --port 8000
Endpoints:
API Root β http://127.0.0.1:8000/
Swagger UI β http://127.0.0.1:8000/docs
Redoc β http://127.0.0.1:8000/redoc
3οΈβ£ Frontend Setup cd ../frontend pip install -r requirements.txt streamlit run dashboard/st_app.py
Once running, visit β http://localhost:8501
π§© Update API_URL inside your Streamlit app to point to your deployed backend API:
API_URL = "https://your-backend-service.onrender.com/classify"
π³ Docker Setup
Each module (backend + frontend) has its own Dockerfile.
π§ Build Backend Image cd backend docker build -t log-backend . docker run -p 8000:8000 log-backend
π Build Frontend Image cd ../frontend docker build -t log-frontend . docker run -p 8501:8501 log-frontend
π Deploy on Render 1οΈβ£ Create Two Services: Service Folder Type Port Backend Service /backend FastAPI (Docker) 8000 Frontend Service /frontend Streamlit (Docker) 8501 2οΈβ£ Environment Variables
In both Render services:
PYTHON_VERSION = 3.10
For backend, also add:
MODEL_PATH = ./models/log_model.pkl
3οΈβ£ Update Frontend URL
After backend deploy, copy its Render URL and update in st_app.py:
API_URL = "https://your-backend.onrender.com/classify"
Then commit & push:
git add . git commit -m "Updated backend API URL" git push origin main
Render will automatically rebuild and redeploy your frontend service.
π Example Input & Output source log_message app1 Connection failed due to timeout app2 FileNotFoundError: dataset.csv missing
Predicted Output:
source log_message target_label app1 Connection failed due to timeout NetworkError app2 FileNotFoundError: dataset.csv missing IOError π§Ύ Logging
The system logs:
API requests and responses
Model inference times
Confidence thresholds and fallback logic
Logs appear in your Render dashboard or local Docker console.
π§ Tech Stack Category Tools Used Language Python Frameworks FastAPI, Streamlit ML/NLP Sentence-Transformers, Scikit-learn Data Viz Plotly, Matplotlib Deployment Docker, Render Version Control Git & GitHub π¨βπ» Author
Aditya Mangal πΌ Data Science & ML Developer π LinkedIn
π GitHub