VarFilter

A comprehensive variant filtering and population genetics analysis pipeline for whole-exome sequencing (WES) data from gnomAD v2.1.1, designed to identify population-specific genetic variants through systematic quality control and allele frequency analysis across seven ancestral populations.

This study identifies population-specific common and rare genetic variants by analyzing allele frequency differences across populations using gnomAD v2.1.1 exome sequencing data. We developed a customized filtering pipeline that performs rigorous quality control and stratified analysis across seven genetic ancestries: East Asian (EAS), South Asian (SAS), Non-Finnish European (NFE), Finnish (FIN), African (AFR), Admixed American (AMR), and Ashkenazi Jewish (ASJ).

Variant extraction

First, we used BCFtools to decompress the compressed VCF files and calculate variant statistics for each chromosome. Next, we developed a Python script that utilizes the cyvcf2 package to extract allele frequencies and other relevant information from the VCF files and organize the results into a standard TSV format.

Variant quality control

In the variant filtering process for gnomAD v2.1.1, we initially performed quality control based on allele count (AC) and allele number (AN) values. We then employed two population genetic structure models, Model A and Model B, to account for different population stratification scenarios.

Step	Description	Number of Variants
0	Initial VCF extraction	17,209,972
1	AC QC: Keep variants with AC > 0 in at least one population	15,425,384
2	AN QC: Keep variants with AN > 0 in all seven populations	15,417,683
3.1	Call Rate 10% QC: AN > 10% of maximum AN in all populations	15,408,487
3.2	Call Rate 20% QC: AN > 10% of maximum AN in all populations	15,404,555
3.3	Call Rate 30% QC: AN > 10% of maximum AN in all populations	15,401,073
3.4	Call Rate 40% QC: AN > 10% of maximum AN in all populations	15,397,425

Note

Rationale: Call rate thresholds ensure adequate sequencing coverage across all populations, with higher thresholds (e.g., 40%) providing maximum confidence at the cost of slightly reduced variant numbers.

Population-Specific Variant Filtering

For each target population, we applied 32 filtering combinations derived from two complementary approaches: Model A: Common in target, rare in others (16 combinations)

Target population AC ≥ {1, 5, 10, 20}
Reference populations (all 6) AF ≤ {0.5, 0.1, 0.05, 0.01}
Example: EAS AC ≥ 10 AND (SAS, NFE, FIN, AFR, AMR, ASJ) all AF ≤ 0.01
- Interpretatio: Variants present in East Asians but rare in other populations

Model B: Rare in Target, common in Others (16 combinations)

Target population AF ≤ {0.5, 0.1, 0.05, 0.01}
Reference populations (all 6) AC ≥ {1, 5, 10, 20}
Example: EAS AF ≤ 0.01 AND (SAS, NFE, FIN, AFR, AMR, ASJ) all AC ≥ 10
- Interpretation: Variants common in other populations but rare in East Asians

Filtering Matrix

7 populations × 32 filtering combinations = 224 population-specific variant sets

These 224 filtering conditions were applied to each of the five quality-controlled datasets (Step 2 + Step 3.1–3.4), generating:

224 combinations × 5 call rate QC levels = 1,120 population-specific variant files

Scenario	Target Pop	Target Pop AF Threshold	Ref Pops	Ref Pop AF Threshold	Interpretation
A	EAS	AF ≥ 20%	SAS, NFE, FIN, AFR, AMR, ASJ	AF ≤ 0.01	East Asian-specific common variant
B	NFE	AF ≥ 1%	EAS, SAS, FIN, AFR, AMR, ASJ	AF ≤ 0.05	European-enriched low-frequency variant with moderate stringency filtering.
C	AFR	AF ≤ 0.01	EAS, SAS, FIN, AFR, AMR, ASJ	AF ≥ 20%	Pan-ancestral common variant depleted in Africans

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
misc		misc
scrs		scrs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VarFilter

Variant extraction

Variant quality control

Population-Specific Variant Filtering

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VarFilter

Variant extraction

Variant quality control

Population-Specific Variant Filtering

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages