To learn more in detail about qpAdm, including what it is, how it works, and its purpose, kindly refer to this : Tutorial
This repository provides a first-of-its-kind scalable qpAdm wrapper that enables a fully reproducible admixture analysis pipeline through a unified Jupyter Notebook interface.
It removes traditional Linux/R setup overhead and automates SNP preprocessing, dataset merging, and qpAdm batch execution for large-scale population genetics analysis.
- Fully reproducible qpAdm workflow
- Unified Jupyter Notebook execution
- Automated SNP filtering (1240K SNP list)
- AADR dataset compatibility
- Automated batch qpAdm runs
- PCA and f-statistics integration
- High-dimensional SNP preprocessing using NumPy & Pandas
This project uses the Allen Ancient DNA Resource (AADR) dataset released by David Reich Lab (Harvard standard reference dataset) : Link
Why AADR ?
- 1240K SNP coverage
- Extensive ancient & modern population coverage
- Gold-standard dataset for qpAdm modeling
HO (Human Origins) dataset is also supported, but AADR is recommended due to broader coverage.
We begin with raw DNA files from consumer genetic testing platforms such as:
AncestryDNA • 23andMe • Genetrack • EasyDNA • MyHeritage
These files are typically provided in .txt format. Before processing, the file must be converted into 23andMe format, which acts as the standardized input format for the pipeline.
The DNA file is filtered using the 1240K SNP list to ensure compatibility with the AADR dataset. After this step, the file matches the SNP structure of AADR.
The filtered file is merged with:
- AADR dataset (recommended)
- OR HO dataset
AADR is preferred because it provides larger population coverage, includes more ancient samples, and offers better modeling resolution compared to alternative datasets.
qpAdm estimates ancestry proportions by modeling a target population as a mixture of selected source populations.
For Indian population structure analysis, major ancestral components often include:
- AASI (Ancient Ancestral South Indian)
- IVC (Iranian farmer-related ancestry)
- Steppe (Steppe pastoralist / Indo-European related ancestry)
Process:
- Select target population
- Choose 7–8 source populations
- Define outgroup populations
- Execute qpAdm runs (automated batch supported)
- P-value > 0.05 → Model Pass (statistically valid)
- P-value < 0.05 → Model Fail (reject model)
