GitHub - arry-codes/qpAdm-notebook: First-of-its-kind 'qpAdm wrapper' - A professional Genomic Admixture Tool, using Jupyter Notebook

🧬 qpAdm Wrapper - Genomic Admixture Modeling Pipeline

To learn more in detail about qpAdm, including what it is, how it works, and its purpose, kindly refer to this : Tutorial

Overview

This repository provides a first-of-its-kind scalable qpAdm wrapper that enables a fully reproducible admixture analysis pipeline through a unified Jupyter Notebook interface.

It removes traditional Linux/R setup overhead and automates SNP preprocessing, dataset merging, and qpAdm batch execution for large-scale population genetics analysis.

Key Features

Fully reproducible qpAdm workflow
Unified Jupyter Notebook execution
Automated SNP filtering (1240K SNP list)
AADR dataset compatibility
Automated batch qpAdm runs
PCA and f-statistics integration
High-dimensional SNP preprocessing using NumPy & Pandas

Dataset

This project uses the Allen Ancient DNA Resource (AADR) dataset released by David Reich Lab (Harvard standard reference dataset) : Link

Why AADR ?

1240K SNP coverage
Extensive ancient & modern population coverage
Gold-standard dataset for qpAdm modeling

HO (Human Origins) dataset is also supported, but AADR is recommended due to broader coverage.

How qpAdm Works (Pipeline Explanation)

We begin with raw DNA files from consumer genetic testing platforms such as:

AncestryDNA • 23andMe • Genetrack • EasyDNA • MyHeritage

These files are typically provided in .txt format. Before processing, the file must be converted into 23andMe format, which acts as the standardized input format for the pipeline.

Step 1 – SNP Filtering (1240K SNP List)

The DNA file is filtered using the 1240K SNP list to ensure compatibility with the AADR dataset. After this step, the file matches the SNP structure of AADR.

Step 2 – Dataset Merge

The filtered file is merged with:

AADR dataset (recommended)
OR HO dataset

AADR is preferred because it provides larger population coverage, includes more ancient samples, and offers better modeling resolution compared to alternative datasets.

Step 3 – qpAdm Modeling

qpAdm estimates ancestry proportions by modeling a target population as a mixture of selected source populations.

For Indian population structure analysis, major ancestral components often include:

AASI (Ancient Ancestral South Indian)
IVC (Iranian farmer-related ancestry)
Steppe (Steppe pastoralist / Indo-European related ancestry)

Process:

Select target population
Choose 7–8 source populations
Define outgroup populations
Execute qpAdm runs (automated batch supported)

Step 4 – Model Evaluation

P-value > 0.05 → Model Pass (statistically valid)
P-value < 0.05 → Model Fail (reject model)

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Filtering_Dataset		Filtering_Dataset
LICENSE		LICENSE
README.md		README.md
admixturetool.ipynb		admixturetool.ipynb
datasetmerging.ipynb		datasetmerging.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 qpAdm Wrapper - Genomic Admixture Modeling Pipeline

Overview

Key Features

Dataset

How qpAdm Works (Pipeline Explanation)

Step 1 – SNP Filtering (1240K SNP List)

Step 2 – Dataset Merge

Step 3 – qpAdm Modeling

Step 4 – Model Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧬 qpAdm Wrapper - Genomic Admixture Modeling Pipeline

Overview

Key Features

Dataset

How qpAdm Works (Pipeline Explanation)

Step 1 – SNP Filtering (1240K SNP List)

Step 2 – Dataset Merge

Step 3 – qpAdm Modeling

Step 4 – Model Evaluation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages