MalwareGuard is an XGBoost-based malware classifier with a modern Flask web dashboard. It uses the ClaMP Integrated malware dataset to train a binary classifier that labels samples as Malware or Benign and visualizes the predictions in a clean UI.
- XGBoost-based malware classification on PE features (ClaMP Integrated dataset)
- Trainable model with saved artifact (
malware_xgb.joblib) - Stylish Flask web dashboard:
- CSV upload
- Summary stats (total samples, malware vs benign)
- Top 50 prediction results with probability
- Encodes categorical fields (e.g.
packer_type) consistently between training & inference
malware-classification-dashboard/
│
├── app.py # Flask web app (upload + results pages)
├── train_malware_model.py # trains XGBoost model and saves malware_xgb.joblib
├── make_test_csv.py # helper to generate test CSVs from the dataset
├── malware_xgb.joblib # trained model artifact (generated)
├── requirements.txt # Python dependencies
├── README.md
│
├── data/
│ ├── ClaMP_Integrated-5184.csv # main training dataset (from Kaggle)
│ ├── ClaMP_Raw-5184.csv # optional raw features
│ ├── test_with_labels.csv # mix of malware/benign with labels (generated)
│ └── test_for_app.csv # same, but without labels (for UI upload)
│
├── templates/
│ ├── index.html # upload page
│ └── results.html # results dashboard
│
└── static/
└── styles.css # custom dark-theme styling
- This project uses the ClaMP malware dataset:
-
Kaggle: Classification of Malwares – ClaMP dataset
You need to download
ClaMP_Integrated-5184.csvand place it into thedata/directory.
-
- In the code, the path is:
CSV_PATH = "data/ClaMP_Integrated-5184.csv"
- Clone the repo
git clone https://github.com/Deb-26/Malware-Classification-ML.git
cd Classification_of_Malware
- Install dependencies
pip install -r requirements.txt
-
Place the dataset
Download
ClaMP_Integrated-5184.csvfrom Kaggle and put it into:
data/ClaMP_Integrated-5184.csv
Run:
python train_malware_model.py
This will:
- Load data/ClaMP_Integrated-5184.csv
- Encode categorical columns (e.g. packer_type)
- Split into train / validation / test sets
- Train an XGBoost classifier
- Print metrics (accuracy, ROC-AUC)
- Save the model + encoding maps to:
malware_xgb.joblib
To generate a balanced test CSV (mixture of malware + benign):
python make_test_csv.py
This creates:
data/test_with_labels.csv– still has the class label (for evaluation)data/test_for_app.csv– no label, good for uploading in the web UI
Make sure malware_xgb.joblib exists (after training), then start Flask:
python app.py
By default, the app runs at:
http://127.0.0.1:5000/
Flow
- Open the URL in your browser.
- On the upload page, select
data/test_for_app.csv(or any CSV with the same feature columns). - Click
“Run Malware Analysis”. - You’ll be redirected to the results page:
- Total samples
- Predicted malware count & percentage
- Predicted benign count
- Table of up to 50 rows with:
filesizepacker_typeE_filefileinfomalware_probability(%)prediction_label(Malware / Benign)
- Add download button to export predictions as CSV
- Color-coded risk levels based on probability
- API endpoint (
/api/predict) that accepts JSON - Model comparison (RandomForest vs XGBoost)
- Dockerfile for containerized deployment






