Skip to content

Exophobias/Suno_Slayers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Suno_Slayers

UVic Spring 2026 CSC475 Project

Coordinator

Group Members

Nathan Pannell

Focus: Feature Extraction and Experimentation

  • Objective 1: Perform a comprehensive literature review of leading AI-classification techniques and compile a shortlist of methods to test.

    • PI1 (basic): Determine which basic MIR features or representations are commonly used in popular AI-detection papers.
    • PI2 (basic): Find 1-2 simple techniques that have been proven to lead to positive performance gains (lyric transcription, common artifacts).
    • PI3 (expected): Create a shortlist of 3-5 papers which provide features that potentially compliment the SONICS proposed technique.
    • PI4 (advanced): Using papers regarding creating GenAI music, identify limitations mentions as additional areas of further study.
    • PI5 (advanced): Identify opportunities to combine techniques from different, leading-edge papers. This will be helpful during final integration.
  • Objective 2: Experimentally replicate selected classification methods and integrate into a simple classification model.

    • PI1 (basic): Build pipeline to extract MIR features from dataset.
    • PI2 (basic): Create an SVM with MIR features and measure performance improvements for each new input feature.
    • PI3 (expected): Reproduce results for shortlisted papers (using publicly available code released by authors) and validate performance on our dataset.
    • PI4 (expected): Extend the SVM with features derived from these new techniques that have been reproduced.
    • PI5 (advanced): Combine techniques, beyond feature extraction, with the baseline SVM to improve performance.
  • Objective 3: Apply findings from experimentation to CNN and transformer models.

    • PI1 (basic): Isolate the specific features that were effective on the SVM.
    • PI2 (expected): Work with Ryan to apply these new features and data pipelines to the Transformer baseline.
    • PI3 (expected): Work with Devanshu to apply these new feature and data pipelines to the CNN baseline.
    • PI4 (advanced): Apply other techniques (from 1.5.) to both of the baselines.
    • PI5 (advanced): Run final validation tests on resulting models from 3.2., 3.3., and 3.4., along with the SVM from 2.2.

Ryan Dreher

Focus: Baseline (Transformer)

  • Objective (develop data preprocessing pipeline and establish baseline classification models)

    • PI1 (basic): implement simple baseline classifier (MLP or SVM) on spectrogram features
    • PI2 (basic): implement spectrogram conversion pipeline matching SONICS paper specifications
    • PI3 (expected): create data splits and implement augmentation techniques (pitch shifting, time stretching, filtering)
    • PI4 (expected): evaluate baseline performance and identify data quality issues
    • PI5 (advanced): optimize data loading with parallel processing and caching
  • Objective (implement transformer baseline and achieve parity with SONICS starting point)

    • PI1 (basic): implement transformer architecture for spectrogram inputs
    • PI2 (basic): set up training loop with optimization and logging
    • PI3 (expected): train transformer and reproduce SONICS baseline results
    • PI4 (expected): compare transformer performance with Devanshu's CNN baseline
    • PI5 (advanced): experiment with transformer parameters and analyze attention patterns
  • Objective (evaluate model robustness and compare baseline approaches)

    • PI1 (basic): evaluate transformer and CNN baselines on test data
    • PI2 (basic): test model performance on modified audio inputs (different sample rates, compression, filtering)
    • PI3 (expected): analyze differences between transformer and CNN performance
    • PI4 (expected): visualize transformer attention to identify important spectrogram regions
    • PI5 (advanced): compare computational requirements for transformer vs CNN

Devanshu Makwana

Focus: CNN Development

  • Objective 1: Develop CNN for music classification

    • PI1 (basic): Design and implement a CNN architecture tailored for audio classification
    • PI2 (basic): Prepare and preprocess dataset for CNN input (e.g., spectrograms)
    • PI3 (expected): Optimize CNN architecture for improved accuracy and robustness
    • PI4 (expected): Experiment with advanced techniques like dropout, batch normalization, and residual connections
    • PI5 (advanced): Compare CNN results to transformer baseline and published SONICS results
  • Objective 2: Explore MIR features for CNN architecture

    • PI1 (basic): Identify and extract relevant MIR features from dataset
    • PI2 (basic): Analyze MIR features for predictive capability in distinguishing AI-generated music
    • PI3 (expected): Develop a pipeline to preprocess MIR features for compatibility with CNN inputs
    • PI4 (expected): Train and evaluate CNN with MIR features, comparing to baseline
    • PI5 (advanced): Optimize MIR feature integration for improved classification accuracy
  • Objective 3: Evaluate and document CNN performance

    • PI1 (basic): Generate classification reports (accuracy, precision, recall)
    • PI2 (basic): Visualize confusion matrix for CNN predictions
    • PI3 (expected): Analyze model errors and identify common misclassifications
    • PI4 (expected): Benchmark CNN against alternative classifiers
    • PI5 (advanced): Document findings and propose improvements for future iterations

Design / Requirement Spec

Project Description

Main Project Goal: Classify music generated by AI (from platforms like Suno and Udio) apart from human-generated music.

Plan: We will work to develop two pieces in parallel. First, replicate the architecture used in the SONICS paper, which takes a spectrogram and uses a transformer to ultimately classify the song as genAI or non-genAI. The second is the extraction of a wide range of MIR features, which will be tested as potential indicators of whether a song is synthetic or not. Finally, we will incorporate insights from MIR feature extraction into the transformer foundation to attempt to surpass the paper's findings.

Considerations: Many papers note that synthetic audio detection struggles when the input is modified (e.g., sample rate or filtering). We must use augmented data to ensure our models have a chance against audio outside our dataset, especially since the field of "AI music" is changing rapidly, so any fixed system would likely become outdated quickly.

Timeline / Objectives

Objective 1: Identify (or create) primary dataset

Target Date: February 10th

Description: We'll need a relatively clean dataset containing Suno or Udio-generated audio tracks and human-generated audio tracks. We should keep diversity and a mix of genres in mind; it's unlikely that both domains will have the same frequency of specific genres. Worth noting that we will be focusing on music that includes vocals. Instrumental music by AI, anecdotally, is much trickier to identify.

Objective 2B (baseline team): Reproduce SpecTTTra architecture (or another baseline model)

Target Date: March 1st

Description: By re-creating a baseline classifier model (particularly one based on a proven system), we can validate the performance against the paper's claims (SONICS).

Objective 2B (feature extraction team): Identify useful features with MIR (and build a pipeline to efficiently extract them)

Target Date: March 1st

Description: Exploratory in nature. Apply a wide range of MIR techniques to both the AI and non-AI data to find features that have a substantive predictive capability. Since we're looking for the features themselves to contain the information (not necessarily the model's weights), simple models like SVMs or even random forests can be used as a proof-of-concept.

Objective 3: Extend baseline model with extracted MIR features

Target Date: April 1st

Description: Add features identified in Objective 2B as additional inputs to the transformer built in Objective 2A. Keep only the features that improve the models' accuracy, and see whether we can surpass the original model's performance reported in the SONICS paper.

Objective 4: General refinements and iteration

Target Date: ~End of term

Description: Building upon the extended baseline model… This can include doing some parameter tuning and testing accuracy.

Bibliography (Target 15-20 References)

  1. Main Inspiration (SONICS): [2408.14080] SONICS: Synthetic Or Not -- Identifying Counterfeit Songs

  2. Discussion of risks to AI-music detection: ISMIR 2025: The AI Music Arms Race: On the Detection of AI-Generated Music

  3. Promising look into specific artifacts created by AI music generators. Leads to a powerful classifier using interpretable criterion, rather than deep learning: [2506.19108] A Fourier Explanation of AI-music Artifacts

  4. From DEEZER!! Showcase that parsing lyrics and using them to detect whether the song was written by AI is more effective than audio-based detection methods: ISMIR 2025: AI-Generated Song Detection via Lyrics Transcripts

  5. Some key things to watch for. Concerns for further areas of research regarding AI-detection: https://arxiv.org/abs/2501.10111v1

  6. Case study into how Suno and Udio are being used. Could hold key findings around trends in output: https://arxiv.org/abs/2509.11824v1

  7. This research paper identifies how AI music is perceived by humans, It explores a turing-like test to see what specific audio cues humans use to identify AI. https://arxiv.org/abs/2509.25601

  8. Address the interpretable criteria of identifying AI Music, It analyzes the latent spaces of models to map to acoustic properties like pitch, loudness and timbre. https://arxiv.org/abs/2510.23802

  9. Most classifiers like SONICS look at the texture of the sound, this paper argues that models like Suno have mastered texture but they often fail at structure. This paper outlines two part framework where Part one is Segment level musical features and Part two is feeding these segments in to a "Segment Transformer" https://arxiv.org/abs/2509.08283

  10. MIR paper which extract the wide ranges of MIR features, it expands on why those specific spectral and temporal features are choosen https://www.researchgate.net/publication/3333877_Musical_Genre_Classification_of_Audio_Signals

  11. This paper bridges the gap between traditional spectrogram analysis and modern Deep Learning. It demonstrates how Convolutional Neural networks extract features from Mel-spectrograms https://arxiv.org/abs/1606.00298

  12. https://www.sciencedirect.com/science/article/pii/S1051200424005803
    Short survey of how transformers are used in audio detection, focusing on spectrogram inputs and design choices for classification models.

  13. https://dl.acm.org/doi/10.1109/IST55454.2022.9827729
    Proposes a transformer that takes spectrogram patches as tokens and shows improved accuracy over CNN baselines on audio tasks

  14. https://onlinelibrary.wiley.com/doi/10.1155/2021/1651560
    Converts music to time–frequency images, then applies a convolutional network to learn spectral and temporal patterns for music classification

  15. https://www.sciencedirect.com/topics/computer-science/music-information-retrieval
    Introductory article summarizing common MIR features such as MFCCs, chroma, and rhythm descriptors and explaining what musical properties they capture.

  16. https://arxiv.org/html/2405.05244v1
    Describes a benchmark and competition for detecting fake singing voices, including datasets, task setup, and baseline system performance.

  17. https://arxiv.org/abs/2309.07525
    Presents a dataset of genuine and synthetic singing, along with evaluations of several anti‑spoofing models on this material.

  18. https://arxiv.org/html/2410.04324v1
    Introduces a unified framework and benchmark for detecting multiple kinds of AI‑generated audio, with experiments on generalization and robustness.

  19. https://www.academia.edu/50775726/IRJET_Music_Information_Retrieval_and_Classification_using_Deep_Learning
    Explores MIR tasks using MFCCs and other features as inputs to deep networks for genre and mood classification.

About

UVic Spring 2026 CSC475 Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors