Feature-engineering analysis of Spotify track data to study how audio features relate to track popularity. The project compares models built on original Spotify features versus engineered composite features, then examines genre-specific behavior. Main deliverable: a rendered report from presentation.Rmd.
- File:
spotify.csv - Size: ~114,000 tracks (114,001 lines including header)
- Content: track metadata + audio features + popularity target
- Key fields include
track_id,artists,track_name,popularity,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,track_genre - Note: CSV includes a leading unnamed index-like column
The report follows this sequence:
- Load and inspect Spotify data.
- Explore distributions/correlations of original audio features.
- Engineer composite features.
- Compare engineered vs original features with correlation and linear models.
- Run simplified feature-selection comparisons.
- Analyze genre-specific differences and model behavior.
- Translate findings into practical production-oriented interpretation.
Engineered features used in the analysis:
energy_loudness_ratiomood_scoreacoustic_electronic_balancehuman_presencecomplexity_scoreenergetic_dance_factor- Derived categories such as
energy_level,valence_category, and genre-normalized danceability
Prerequisites:
- R installed
- LaTeX renderer such as
pdflatex
From the repository root:
R -e "install.packages('renv', repos='https://cloud.r-project.org')"
R -e "renv::restore()"
Rscript -e "rmarkdown::render('presentation.Rmd')"Expected output:
presentation.pdf(created or updated)
- Observational analysis of one dataset; no causal claims.
- Popularity is modeled from available audio/metadata features only.
- Feature-engineering impact is not uniform across genres.
- Results depend on dataset composition and preprocessing choices.