JenkinsLLM - DevOps Pipeline Project

Project Overview

This is a LLM project that uses data from CommonCrawl to train a PyTorch GPT model that can generate results from prompts. I used Flask for the inference, and it has a basic GPT-like interface which enables the user to converse with the AI. The model took 9 hours to train and was able to complete sentences form basic prompts.

Name: JenkinsLLM
Tech Stack: Python, PyTorch, Jenkins, Docker, SonarQube, Prometheus
Stages: 7 (Build, Test, Quality, Security, Deploy, Release, Monitoring)

Features

Large Language Model that can complete sentences from basic prompt
7-stage DevOps pipeline that is automatic
Security scanning with Bandit and Safety
SonarQube for Code Quality analysis
Production monitoring with Prometheus

Technology Stack

Category	Technologies	Use
CI/CD	Jenkins	Pipeline
Containerization	Docker	Package application
Code Quality	SonarQube	Static analysis & coverage
Security	Safety, Bandit	Scan dependencies & vulnerabilities
Monitoring	Prometheus	Collect metrics and data
Application	Python, PyTorch, Flask	Training & inference
Testing	Unittest	Test code

Pipeline Implementation

1. Build Stage

Packages the entire project into a docker image with health checks and version tagging.

2. Test Stage

Conducts python class test with unittest and coverage test.

3. Code Quality Stage

Uses SonarQube to check static codes and look for any issues or basic vulnerabilities.

4. Security Stage

Uses Safety and Bandit to conduct security test on the code. Checks vulnerabilities with dependencies and common security issues.

5. Deployment Stage

Uses docker to deploy the test website on localhost:5001 with health checks and logging.

6. Release Stage

Releases the production version on localhost:5000. Creates release notes and sends email notifications. Uses simplified Blue-Green deployment strategy.

7. Monitoring Stage

Provides 3 endpoints to monitor and check website status:

/health: Application health
/metrics: Prometheus metrics
/status: System resource

Technical Features

LLM Architecture

Model Size: 9M parameters
Architecture: 3 layers, 4 attention heads, 512 hidden dimensions
Training Data: 10GB of CommonCrawl Data cleaned to 1GB training data
Inference: Flask API

LLM Pipeline

Data processing pipeline with 5 stages: Data Cleaning → Preparation → Model Creation → Training → Inference

CommonCrawl Data Cleaning

3 main steps during data cleaning process:

Length Filter, Language Detection, Encoding Validation
Line diversity, Word diversity, Sentence, Content ratio
Similar duplicates, Very similar duplicates, Cross document

This reduces the data by 90%. From 10GB of raw data to 1GB cleaned data.

Model Configuration

# Model size  
self.vocab_size = 10000  
self.d_model = 512  
self.n_heads = 4  
self.n_layers = 3  
self.d_ff = int(self.d_model * 2.7)  # SwiGLU activation
self.max_seq_len = 32  

# Training  
self.dropout = 0.005  
self.learning_rate = 0.0004  
self.batch_size = 128

Installation & Usage

Prerequisites

Docker
Jenkins
SonarQube Server

Quick Start

# Clone repository
git clone https://github.com/wilsnd/JenkinsLLM

# Build and run with Docker
docker-compose up -d

# Access application
# Test environment: http://localhost:5001
# Production: http://localhost:5000

Jenkins Pipeline

The Jenkins pipeline automatically triggers on code changes and runs through all 7 stages with email notifications.

Future Improvements

Fix the SonarQube quality issues
Try Kubernetes
Optimize the code, mainly the preparation step to make it more efficient
Train with more data
Improve the model
Package only the necessary files

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.idea		.idea
__pycache__		__pycache__
cleaning		cleaning
data		data
inference		inference
model		model
models/all_models		models/all_models
monitoring		monitoring
preparation		preparation
processed_data		processed_data
scripts		scripts
tests		tests
training		training
.dockerignore		.dockerignore
.env.dev		.env.dev
.env.prod		.env.prod
.env.test		.env.test
.gitignore		.gitignore
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
sonar-project.properties		sonar-project.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JenkinsLLM - DevOps Pipeline Project

Project Overview

Features

Technology Stack

Pipeline Implementation

1. Build Stage

2. Test Stage

3. Code Quality Stage

4. Security Stage

5. Deployment Stage

6. Release Stage

7. Monitoring Stage

Technical Features

LLM Architecture

LLM Pipeline

CommonCrawl Data Cleaning

Model Configuration

Installation & Usage

Prerequisites

Quick Start

Jenkins Pipeline

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JenkinsLLM - DevOps Pipeline Project

Project Overview

Features

Technology Stack

Pipeline Implementation

1. Build Stage

2. Test Stage

3. Code Quality Stage

4. Security Stage

5. Deployment Stage

6. Release Stage

7. Monitoring Stage

Technical Features

LLM Architecture

LLM Pipeline

CommonCrawl Data Cleaning

Model Configuration

Installation & Usage

Prerequisites

Quick Start

Jenkins Pipeline

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages