Suno_Slayers

UVic Spring 2026 CSC475 Project

Coordinator

George Tzanetakis (gtzan@cs.uvic.ca)

Group Members

Nathan Pannell (npannell@uvic.ca)
Ryan Dreher (rdreher@uvic.ca)
Devanshu Makwana (devanshumakwana@uvic.ca)

Nathan Pannell

Focus: Feature Extraction and Experimentation

Objective 1: Perform a comprehensive literature review of leading AI-classification techniques and compile a shortlist of methods to test.
- PI1 (basic): Determine which basic MIR features or representations are commonly used in popular AI-detection papers.
- PI2 (basic): Find 1-2 simple techniques that have been proven to lead to positive performance gains (lyric transcription, common artifacts).
- PI3 (expected): Create a shortlist of 3-5 papers which provide features that potentially compliment the SONICS proposed technique.
- PI4 (advanced): Using papers regarding creating GenAI music, identify limitations mentions as additional areas of further study.
- PI5 (advanced): Identify opportunities to combine techniques from different, leading-edge papers. This will be helpful during final integration.
Objective 2: Experimentally replicate selected classification methods and integrate into a simple classification model.
- PI1 (basic): Build pipeline to extract MIR features from dataset.
- PI2 (basic): Create an SVM with MIR features and measure performance improvements for each new input feature.
- PI3 (expected): Reproduce results for shortlisted papers (using publicly available code released by authors) and validate performance on our dataset.
- PI4 (expected): Extend the SVM with features derived from these new techniques that have been reproduced.
- PI5 (advanced): Combine techniques, beyond feature extraction, with the baseline SVM to improve performance.
Objective 3: Apply findings from experimentation to CNN and transformer models.
- PI1 (basic): Isolate the specific features that were effective on the SVM.
- PI2 (expected): Work with Ryan to apply these new features and data pipelines to the Transformer baseline.
- PI3 (expected): Work with Devanshu to apply these new feature and data pipelines to the CNN baseline.
- PI4 (advanced): Apply other techniques (from 1.5.) to both of the baselines.
- PI5 (advanced): Run final validation tests on resulting models from 3.2., 3.3., and 3.4., along with the SVM from 2.2.

Ryan Dreher

Focus: Baseline (Transformer)

Objective (develop data preprocessing pipeline and establish baseline classification models)
- PI1 (basic): implement simple baseline classifier (MLP or SVM) on spectrogram features
- PI2 (basic): implement spectrogram conversion pipeline matching SONICS paper specifications
- PI3 (expected): create data splits and implement augmentation techniques (pitch shifting, time stretching, filtering)
- PI4 (expected): evaluate baseline performance and identify data quality issues
- PI5 (advanced): optimize data loading with parallel processing and caching
Objective (implement transformer baseline and achieve parity with SONICS starting point)
- PI1 (basic): implement transformer architecture for spectrogram inputs
- PI2 (basic): set up training loop with optimization and logging
- PI3 (expected): train transformer and reproduce SONICS baseline results
- PI4 (expected): compare transformer performance with Devanshu's CNN baseline
- PI5 (advanced): experiment with transformer parameters and analyze attention patterns
Objective (evaluate model robustness and compare baseline approaches)
- PI1 (basic): evaluate transformer and CNN baselines on test data
- PI2 (basic): test model performance on modified audio inputs (different sample rates, compression, filtering)
- PI3 (expected): analyze differences between transformer and CNN performance
- PI4 (expected): visualize transformer attention to identify important spectrogram regions
- PI5 (advanced): compare computational requirements for transformer vs CNN

Devanshu Makwana

Focus: CNN Development

Objective 1: Develop CNN for music classification
- PI1 (basic): Design and implement a CNN architecture tailored for audio classification
- PI2 (basic): Prepare and preprocess dataset for CNN input (e.g., spectrograms)
- PI3 (expected): Optimize CNN architecture for improved accuracy and robustness
- PI4 (expected): Experiment with advanced techniques like dropout, batch normalization, and residual connections
- PI5 (advanced): Compare CNN results to transformer baseline and published SONICS results
Objective 2: Explore MIR features for CNN architecture
- PI1 (basic): Identify and extract relevant MIR features from dataset
- PI2 (basic): Analyze MIR features for predictive capability in distinguishing AI-generated music
- PI3 (expected): Develop a pipeline to preprocess MIR features for compatibility with CNN inputs
- PI4 (expected): Train and evaluate CNN with MIR features, comparing to baseline
- PI5 (advanced): Optimize MIR feature integration for improved classification accuracy
Objective 3: Evaluate and document CNN performance
- PI1 (basic): Generate classification reports (accuracy, precision, recall)
- PI2 (basic): Visualize confusion matrix for CNN predictions
- PI3 (expected): Analyze model errors and identify common misclassifications
- PI4 (expected): Benchmark CNN against alternative classifiers
- PI5 (advanced): Document findings and propose improvements for future iterations

Design / Requirement Spec

Project Description

Main Project Goal: Classify music generated by AI (from platforms like Suno and Udio) apart from human-generated music.

Plan: We will work to develop two pieces in parallel. First, replicate the architecture used in the SONICS paper, which takes a spectrogram and uses a transformer to ultimately classify the song as genAI or non-genAI. The second is the extraction of a wide range of MIR features, which will be tested as potential indicators of whether a song is synthetic or not. Finally, we will incorporate insights from MIR feature extraction into the transformer foundation to attempt to surpass the paper's findings.

Considerations: Many papers note that synthetic audio detection struggles when the input is modified (e.g., sample rate or filtering). We must use augmented data to ensure our models have a chance against audio outside our dataset, especially since the field of "AI music" is changing rapidly, so any fixed system would likely become outdated quickly.

Timeline / Objectives

Objective 1: Identify (or create) primary dataset

Target Date: February 10th

Description: We'll need a relatively clean dataset containing Suno or Udio-generated audio tracks and human-generated audio tracks. We should keep diversity and a mix of genres in mind; it's unlikely that both domains will have the same frequency of specific genres. Worth noting that we will be focusing on music that includes vocals. Instrumental music by AI, anecdotally, is much trickier to identify.

Objective 2B (baseline team): Reproduce SpecTTTra architecture (or another baseline model)

Target Date: March 1st

Description: By re-creating a baseline classifier model (particularly one based on a proven system), we can validate the performance against the paper's claims (SONICS).

Objective 2B (feature extraction team): Identify useful features with MIR (and build a pipeline to efficiently extract them)

Target Date: March 1st

Description: Exploratory in nature. Apply a wide range of MIR techniques to both the AI and non-AI data to find features that have a substantive predictive capability. Since we're looking for the features themselves to contain the information (not necessarily the model's weights), simple models like SVMs or even random forests can be used as a proof-of-concept.

Objective 3: Extend baseline model with extracted MIR features

Target Date: April 1st

Description: Add features identified in Objective 2B as additional inputs to the transformer built in Objective 2A. Keep only the features that improve the models' accuracy, and see whether we can surpass the original model's performance reported in the SONICS paper.

Objective 4: General refinements and iteration

Target Date: ~End of term

Description: Building upon the extended baseline model… This can include doing some parameter tuning and testing accuracy.

Bibliography (Target 15-20 References)

Main Inspiration (SONICS): [2408.14080] SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
- Dataset (32GB): SONICS: Real vs Fake Songs Detection Dataset
Discussion of risks to AI-music detection: ISMIR 2025: The AI Music Arms Race: On the Detection of AI-Generated Music
- Source code: GitHub - lcrosvila/ai-music-detection: This is the official repository for the paper Detecting AI-Generated Music
Promising look into specific artifacts created by AI music generators. Leads to a powerful classifier using interpretable criterion, rather than deep learning: [2506.19108] A Fourier Explanation of AI-music Artifacts
From DEEZER!! Showcase that parsing lyrics and using them to detect whether the song was written by AI is more effective than audio-based detection methods: ISMIR 2025: AI-Generated Song Detection via Lyrics Transcripts
- Source code: Code to reproduce the experiments presented in the article "Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion" (Findings of ACL 2025)
Some key things to watch for. Concerns for further areas of research regarding AI-detection: https://arxiv.org/abs/2501.10111v1
Case study into how Suno and Udio are being used. Could hold key findings around trends in output: https://arxiv.org/abs/2509.11824v1
This research paper identifies how AI music is perceived by humans, It explores a turing-like test to see what specific audio cues humans use to identify AI. https://arxiv.org/abs/2509.25601
Address the interpretable criteria of identifying AI Music, It analyzes the latent spaces of models to map to acoustic properties like pitch, loudness and timbre. https://arxiv.org/abs/2510.23802
Most classifiers like SONICS look at the texture of the sound, this paper argues that models like Suno have mastered texture but they often fail at structure. This paper outlines two part framework where Part one is Segment level musical features and Part two is feeding these segments in to a "Segment Transformer" https://arxiv.org/abs/2509.08283
MIR paper which extract the wide ranges of MIR features, it expands on why those specific spectral and temporal features are choosen https://www.researchgate.net/publication/3333877_Musical_Genre_Classification_of_Audio_Signals
This paper bridges the gap between traditional spectrogram analysis and modern Deep Learning. It demonstrates how Convolutional Neural networks extract features from Mel-spectrograms https://arxiv.org/abs/1606.00298
https://www.sciencedirect.com/science/article/pii/S1051200424005803
Short survey of how transformers are used in audio detection, focusing on spectrogram inputs and design choices for classification models.
https://dl.acm.org/doi/10.1109/IST55454.2022.9827729
Proposes a transformer that takes spectrogram patches as tokens and shows improved accuracy over CNN baselines on audio tasks
https://onlinelibrary.wiley.com/doi/10.1155/2021/1651560
Converts music to time–frequency images, then applies a convolutional network to learn spectral and temporal patterns for music classification
https://www.sciencedirect.com/topics/computer-science/music-information-retrieval
Introductory article summarizing common MIR features such as MFCCs, chroma, and rhythm descriptors and explaining what musical properties they capture.
https://arxiv.org/html/2405.05244v1
Describes a benchmark and competition for detecting fake singing voices, including datasets, task setup, and baseline system performance.
https://arxiv.org/abs/2309.07525
Presents a dataset of genuine and synthetic singing, along with evaluations of several anti‑spoofing models on this material.
https://arxiv.org/html/2410.04324v1
Introduces a unified framework and benchmark for detecting multiple kinds of AI‑generated audio, with experiments on generalization and robustness.
https://www.academia.edu/50775726/IRJET_Music_Information_Retrieval_and_Classification_using_Deep_Learning
Explores MIR tasks using MFCCs and other features as inputs to deep networks for genre and mood classification.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
dataset		dataset
feature_extraction		feature_extraction
fusion		fusion
.gitignore		.gitignore
README.md		README.md
literature_review.md		literature_review.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Suno_Slayers

Coordinator

Group Members

Nathan Pannell

Ryan Dreher

Devanshu Makwana

Design / Requirement Spec

Project Description

Timeline / Objectives

Objective 1: Identify (or create) primary dataset

Objective 2B (baseline team): Reproduce SpecTTTra architecture (or another baseline model)

Objective 2B (feature extraction team): Identify useful features with MIR (and build a pipeline to efficiently extract them)

Objective 3: Extend baseline model with extracted MIR features

Objective 4: General refinements and iteration

Bibliography (Target 15-20 References)

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Suno_Slayers

Coordinator

Group Members

Nathan Pannell

Ryan Dreher

Devanshu Makwana

Design / Requirement Spec

Project Description

Timeline / Objectives

Objective 1: Identify (or create) primary dataset

Objective 2B (baseline team): Reproduce SpecTTTra architecture (or another baseline model)

Objective 2B (feature extraction team): Identify useful features with MIR (and build a pipeline to efficiently extract them)

Objective 3: Extend baseline model with extracted MIR features

Objective 4: General refinements and iteration

Bibliography (Target 15-20 References)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages