MalwareGuard – Malware Classification Dashboard

MalwareGuard is an XGBoost-based malware classifier with a modern Flask web dashboard. It uses the ClaMP Integrated malware dataset to train a binary classifier that labels samples as Malware or Benign and visualizes the predictions in a clean UI.

Features

XGBoost-based malware classification on PE features (ClaMP Integrated dataset)
Trainable model with saved artifact (malware_xgb.joblib)
Stylish Flask web dashboard:
- CSV upload
- Summary stats (total samples, malware vs benign)
- Top 50 prediction results with probability
Encodes categorical fields (e.g. packer_type) consistently between training & inference

Project Structure

malware-classification-dashboard/
│
├── app.py                     # Flask web app (upload + results pages)
├── train_malware_model.py     # trains XGBoost model and saves malware_xgb.joblib
├── make_test_csv.py           # helper to generate test CSVs from the dataset
├── malware_xgb.joblib         # trained model artifact (generated)
├── requirements.txt           # Python dependencies
├── README.md
│
├── data/
│   ├── ClaMP_Integrated-5184.csv   # main training dataset (from Kaggle)
│   ├── ClaMP_Raw-5184.csv          # optional raw features
│   ├── test_with_labels.csv        # mix of malware/benign with labels (generated)
│   └── test_for_app.csv            # same, but without labels (for UI upload)
│
├── templates/
│   ├── index.html              # upload page
│   └── results.html            # results dashboard
│
└── static/
    └── styles.css              # custom dark-theme styling

Dataset

This project uses the ClaMP malware dataset:
- Kaggle: Classification of Malwares – ClaMP dataset
  
  You need to download ClaMP_Integrated-5184.csv and place it into the data/ directory.
In the code, the path is:

CSV_PATH = "data/ClaMP_Integrated-5184.csv"

Setup

Clone the repo

git clone https://github.com/Deb-26/Malware-Classification-ML.git
cd Classification_of_Malware

Install dependencies

pip install -r requirements.txt

Place the dataset

Download ClaMP_Integrated-5184.csv from Kaggle and put it into:

data/ClaMP_Integrated-5184.csv

Training the model

Run:

python train_malware_model.py

This will:

Load data/ClaMP_Integrated-5184.csv
Encode categorical columns (e.g. packer_type)
Split into train / validation / test sets
Train an XGBoost classifier
Print metrics (accuracy, ROC-AUC)
Save the model + encoding maps to:

malware_xgb.joblib

Creating a test CSV

To generate a balanced test CSV (mixture of malware + benign):

python make_test_csv.py

This creates:

data/test_with_labels.csv – still has the class label (for evaluation)
data/test_for_app.csv – no label, good for uploading in the web UI

Running the web app

Make sure malware_xgb.joblib exists (after training), then start Flask:

python app.py

By default, the app runs at:

http://127.0.0.1:5000/

Flow

Open the URL in your browser.
On the upload page, select data/test_for_app.csv (or any CSV with the same feature columns).
Click “Run Malware Analysis”.
You’ll be redirected to the results page:
- Total samples
- Predicted malware count & percentage
- Predicted benign count
- Table of up to 50 rows with:
  - filesize
  - packer_type
  - E_file
  - fileinfo
  - malware_probability (%)
  - prediction_label (Malware / Benign)

Screenshots

Dashboard
Output
- test_for_app.csv
- Results summary cards & Prediction table
- test_with_labels.csv
- Results summary cards & Prediction table
- ClaMP_Integrated-5184.csv
- Results summary cards & Prediction table

Possible Improvements

Add download button to export predictions as CSV
Color-coded risk levels based on probability
API endpoint (/api/predict) that accepts JSON
Model comparison (RandomForest vs XGBoost)
Dockerfile for containerized deployment

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Classification_of_Malware		Classification_of_Malware
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MalwareGuard – Malware Classification Dashboard

Features

Project Structure

Dataset

Setup

Training the model

Creating a test CSV

Running the web app

Screenshots

Possible Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MalwareGuard – Malware Classification Dashboard

Features

Project Structure

Dataset

Setup

Training the model

Creating a test CSV

Running the web app

Screenshots

Possible Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages