This repository contains code for a machine learning model that classifies emails as spam or non-spam (ham). The model uses a Support Vector Machine (SVM) algorithm with a text classification approach.
This project aims to build a machine learning model for classifying emails as spam or non-spam. The SVM algorithm is chosen for its effectiveness in text classification tasks.
- Text classification using Support Vector Machine (SVM)
- Model training, evaluation, and tuning scripts
- Python 3.6+
- To use the Email Spam Classification code, follow these steps:
-
Prepare Your Data:
Ensure your email data is in a suitable format for the model. For example, a CSV file with columns for email text and labels.
-
Train the Model:
Use the train_model.py script to train the SVM model on your training data.
-
Make Predictions:
After training, you can use the model to make predictions on new email data.
- The dataset used for this project can be found at Dataset Source.
- To train the SVM model on your data, use the train data. Ensure your data is properly formatted and split into training and testing sets.
- Evaluate the model's performance using the evaluation metric. This will provide insights into accuracy and other relevant metrics.
-
Experiment with hyperparameter tuning to improve model performance. Adjust parameters in the tuned_param dictionary within the train_model.py script.
tuned_param = {'kernel': ['linear', 'rbf'], 'gamma': [1e-3, 1e-4], 'C': [1, 10, 100, 1000, 10000]}
- The accuracy is 86%. We couldn't achieve higher accuracy because the dataset is little