Spotify Song Popularity Predictor

Author: Paarth Sharma

A complete end-to-end machine learning project that predicts the popularity score of a Spotify track using structured audio features and encoded genre information. This repository includes preprocessing scripts, model training, evaluation, and an interactive Streamlit app that ties everything together. Dataset used : https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset

Overview

This project explores whether it is possible to predict Spotify track popularity using only numeric features such as danceability, tempo, valence, loudness, and other audio indicators. The workflow covers data cleaning, encoding, scaling, model training, cross-validation, evaluation, and deployment.

The final result is a working ML pipeline with a user-friendly UI that demonstrates predictions, dataset insights, model performance, and feature importances.

Features

Full preprocessing pipeline (cleaning, filtering, encoding, scaling, splitting)
Genre encoding using LabelEncoder
Standardization of all numeric features
GradientBoostingRegressor with 5-fold cross-validation
Separate train/validation/test splits
Streamlit application with:
- Popularity prediction
- Dataset preview
- Heatmaps and scatter plots
- Feature importance visualization
- Actual vs predicted comparison

Tech Stack

Python
pandas
numpy
scikit-learn (LabelEncoder, StandardScaler, GradientBoostingRegressor, KFold)
Streamlit
Plotly
joblib

Pipeline Breakdown

1. Preprocessing (`data_processing.py`)

Drops duplicates and missing popularity rows
Forward/backward fills remaining missing values
Removes rows where all audio features are zero
Converts explicit into 0/1
Encodes track_genre
Selects numeric columns
Scales all features
Splits data into train, validation, and test
Saves processed datasets and transformation models

2. Training (`train_model.py`)

Loads processed datasets
Runs 5-fold cross-validation
Computes RMSE per fold
Trains final model on full training + validation
Saves final model to models/

3. Evaluation (`evaluate_model.py`)

Loads the trained model
Evaluates performance on the test set
Prints RMSE, MAE, and R²

4. Streamlit Application (`app.py`)

Loads the trained model, scaler, and label encoder
User inputs track features in a clean UI
Encodes genre on the fly
Predicts popularity
Displays statistics, heatmaps, scatter plots, and feature importances

Model Performance

Popularity is extremely difficult to predict from audio features alone. Many real-world factors are not in the dataset (artist popularity, playlist placement, promotion, viral trends).

Given that, the model’s performance is within expected ranges:

RMSE: around 14
MAE: around 11
R²: roughly 0.35–0.40

The goal of this project is not perfect accuracy but building a full ML pipeline and deployment.

Running the Project

for running this project open Docker on your Device and then run the docker file using the following :

docker build -t spotify-ml .

docker run -p 8501:8501 spotify-ml-app

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
01_EDA.ipynb		01_EDA.ipynb
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
data_processing.py		data_processing.py
evaluate_model.py		evaluate_model.py
old_data_collection_API.py		old_data_collection_API.py
requirements.txt		requirements.txt
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify Song Popularity Predictor

Overview

Features

Tech Stack

Pipeline Breakdown

1. Preprocessing (`data_processing.py`)

2. Training (`train_model.py`)

3. Evaluation (`evaluate_model.py`)

4. Streamlit Application (`app.py`)

Model Performance

Running the Project

for running this project open Docker on your Device and then run the docker file using the following :

About

Uh oh!

Releases

Packages

Languages

cogniera/SpotifySongPopularityPredictor

Folders and files

Latest commit

History

Repository files navigation

Spotify Song Popularity Predictor

Overview

Features

Tech Stack

Pipeline Breakdown

1. Preprocessing (data_processing.py)

2. Training (train_model.py)

3. Evaluation (evaluate_model.py)

4. Streamlit Application (app.py)

Model Performance

Running the Project

for running this project open Docker on your Device and then run the docker file using the following :

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Preprocessing (`data_processing.py`)

2. Training (`train_model.py`)

3. Evaluation (`evaluate_model.py`)

4. Streamlit Application (`app.py`)

Packages