Skip to content

Added script to download real_songs from csv#8

Open
SalmanQureshi97 wants to merge 2 commits intoawsaf49:mainfrom
SalmanQureshi97:main
Open

Added script to download real_songs from csv#8
SalmanQureshi97 wants to merge 2 commits intoawsaf49:mainfrom
SalmanQureshi97:main

Conversation

@SalmanQureshi97
Copy link

@SalmanQureshi97 SalmanQureshi97 commented Feb 3, 2026

This PR adds a script for downloading real songs from YouTube using the provided real_songs.csv, following the same principles outlined in the SONICS dataset construction and reproducibility statement.

The script uses yt-dlp to fetch audio directly from YouTube links/IDs, extracting audio-only files in a deterministic and reproducible manner. This mirrors the original SONICS pipeline, where real songs are dynamically retrieved from YouTube and not redistributed as part of the dataset.

Usage

  1. Install the required dependency:
pip install yt-dlp
  1. Run the script from the SONICS Hugging Face directory:
https://huggingface.co/datasets/awsaf49/sonics

The script reads real_songs.csv and downloads the corresponding YouTube audio files locally.

Notes on Reproducibility

  • Only YouTube IDs/links are used; audio files are not redistributed.
  • Inactive or unavailable videos are skipped, consistent with the original dataset creation process.
  • Users can regenerate the real-song portion of the dataset independently, ensuring transparency and reproducibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant