Skip to content

aileks/spotify-data-analysis

Repository files navigation

Spotify Data Analysis: Engineering Musical Success

Feature-engineering analysis of Spotify track data to study how audio features relate to track popularity. The project compares models built on original Spotify features versus engineered composite features, then examines genre-specific behavior. Main deliverable: a rendered report from presentation.Rmd.

Dataset

  • File: spotify.csv
  • Size: ~114,000 tracks (114,001 lines including header)
  • Content: track metadata + audio features + popularity target
  • Key fields include track_id, artists, track_name, popularity, danceability, energy, loudness, speechiness, acousticness, instrumentalness, liveness, valence, tempo, track_genre
  • Note: CSV includes a leading unnamed index-like column

Analysis Workflow

The report follows this sequence:

  1. Load and inspect Spotify data.
  2. Explore distributions/correlations of original audio features.
  3. Engineer composite features.
  4. Compare engineered vs original features with correlation and linear models.
  5. Run simplified feature-selection comparisons.
  6. Analyze genre-specific differences and model behavior.
  7. Translate findings into practical production-oriented interpretation.

Engineered features used in the analysis:

  • energy_loudness_ratio
  • mood_score
  • acoustic_electronic_balance
  • human_presence
  • complexity_score
  • energetic_dance_factor
  • Derived categories such as energy_level, valence_category, and genre-normalized danceability

Quick Start

Prerequisites:

  • R installed
  • LaTeX renderer such as pdflatex

From the repository root:

R -e "install.packages('renv', repos='https://cloud.r-project.org')"
R -e "renv::restore()"
Rscript -e "rmarkdown::render('presentation.Rmd')"

Expected output:

  • presentation.pdf (created or updated)

Scope and Limitations

  • Observational analysis of one dataset; no causal claims.
  • Popularity is modeled from available audio/metadata features only.
  • Feature-engineering impact is not uniform across genres.
  • Results depend on dataset composition and preprocessing choices.

About

Project for ASU's DAT 301 course. Data analysis of a Spotify track dataset focused on feature engineering comparisons.

Resources

Stars

Watchers

Forks

Contributors

Languages