Skip to content

Adityamangal101/Log-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Log Classification with Hybrid NLP Framework

This project implements an intelligent Hybrid Log Classification System combining Regex, Machine Learning (Sentence Transformer + Logistic Regression), and LLM-based reasoning to classify log messages into meaningful categories.

It’s a complete end-to-end production system, featuring separate FastAPI backend and Streamlit frontendβ€”packaged via Docker and deployable on Render.

πŸ—οΈ Project Architecture Log_Classification_Project/ β”‚ β”œβ”€β”€ backend/ β”‚ β”œβ”€β”€ api/ β”‚ β”‚ └── server.py # FastAPI backend API β”‚ β”œβ”€β”€ src/ β”‚ β”‚ β”œβ”€β”€ classify.py # Core hybrid classification pipeline β”‚ β”‚ β”œβ”€β”€ processor_regex.py # Regex-based classification logic β”‚ β”‚ β”œβ”€β”€ preprocess.py # Text cleaning and preprocessing β”‚ β”œβ”€β”€ models/ β”‚ β”‚ β”œβ”€β”€ log_model.pkl # Logistic Regression model β”‚ β”‚ └── sentence_transformer/ # Sentence Transformer model β”‚ β”œβ”€β”€ requirements.txt β”‚ β”œβ”€β”€ Dockerfile β”‚ └── start.sh # Backend startup script β”‚ β”œβ”€β”€ frontend/ β”‚ β”œβ”€β”€ dashboard/ β”‚ β”‚ └── st_app.py # Streamlit dashboard for visualization β”‚ β”œβ”€β”€ requirements.txt β”‚ β”œβ”€β”€ Dockerfile β”‚ └── start.sh # Frontend startup script β”‚ β”œβ”€β”€ resources/ β”‚ β”œβ”€β”€ example_logs.csv β”‚ └── architecture.png β”‚ └── README.md

βš™οΈ Hybrid Classification Workflow

The hybrid classifier follows a three-tiered logic flow:

🧩 Regex Layer Quickly detects well-known or patterned log formats (e.g. ERROR 404, Timeout, Disk full).

πŸ” ML Layer (Sentence Transformer + Logistic Regression) Encodes log messages and predicts labels using a trained model for structured log data.

🧠 LLM Fallback Layer Invoked when the above two models return uncertain predictions or ambiguous messages.

πŸš€ Local Setup 1️⃣ Clone the Repository git clone https://github.com//log-classification.git cd log-classification

2️⃣ Backend Setup cd backend python -m venv logenv logenv\Scripts\activate # (Windows) pip install -r requirements.txt uvicorn api.server:app --host 0.0.0.0 --port 8000

Endpoints:

API Root β†’ http://127.0.0.1:8000/

Swagger UI β†’ http://127.0.0.1:8000/docs

Redoc β†’ http://127.0.0.1:8000/redoc

3️⃣ Frontend Setup cd ../frontend pip install -r requirements.txt streamlit run dashboard/st_app.py

Once running, visit β†’ http://localhost:8501

🧩 Update API_URL inside your Streamlit app to point to your deployed backend API:

API_URL = "https://your-backend-service.onrender.com/classify"

🐳 Docker Setup

Each module (backend + frontend) has its own Dockerfile.

🧠 Build Backend Image cd backend docker build -t log-backend . docker run -p 8000:8000 log-backend

πŸ“Š Build Frontend Image cd ../frontend docker build -t log-frontend . docker run -p 8501:8501 log-frontend

🌐 Deploy on Render 1️⃣ Create Two Services: Service Folder Type Port Backend Service /backend FastAPI (Docker) 8000 Frontend Service /frontend Streamlit (Docker) 8501 2️⃣ Environment Variables

In both Render services:

PYTHON_VERSION = 3.10

For backend, also add:

MODEL_PATH = ./models/log_model.pkl

3️⃣ Update Frontend URL

After backend deploy, copy its Render URL and update in st_app.py:

API_URL = "https://your-backend.onrender.com/classify"

Then commit & push:

git add . git commit -m "Updated backend API URL" git push origin main

Render will automatically rebuild and redeploy your frontend service.

πŸ“Š Example Input & Output source log_message app1 Connection failed due to timeout app2 FileNotFoundError: dataset.csv missing

Predicted Output:

source log_message target_label app1 Connection failed due to timeout NetworkError app2 FileNotFoundError: dataset.csv missing IOError 🧾 Logging

The system logs:

API requests and responses

Model inference times

Confidence thresholds and fallback logic

Logs appear in your Render dashboard or local Docker console.

🧠 Tech Stack Category Tools Used Language Python Frameworks FastAPI, Streamlit ML/NLP Sentence-Transformers, Scikit-learn Data Viz Plotly, Matplotlib Deployment Docker, Render Version Control Git & GitHub πŸ‘¨β€πŸ’» Author

Aditya Mangal πŸ’Ό Data Science & ML Developer πŸ”— LinkedIn

πŸ“‚ GitHub

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors