A pure-Python implementation of the Horvath 2013 pan-tissue DNA methylation age predictor. Calculate biological age from CpG beta values β no dependencies required.
Your DNA sequence stays (mostly) the same throughout life, but how it's read changes dramatically. DNA methylation β the addition of methyl groups to cytosine bases at CpG dinucleotides β is one of the most studied epigenetic modifications. It acts as a molecular switch: methylated promoters are typically silenced, while unmethylated ones are active.
In 2013, Steve Horvath made a landmark discovery: methylation levels at 353 specific CpG sites change so predictably with age that they can be used as a biological clock. This wasn't just measuring one gene β it was a multi-tissue model trained on 8,000+ samples across 51 tissue and cell types, from brain to blood to liver.
The result: a predictor that estimates biological age with a median error of ~3.6 years across tissues. Crucially, biological age β chronological age. The difference between them β age acceleration β captures something real about health and mortality risk.
Epigenetic clocks have become one of the most important biomarkers in aging research:
| Application | Why It Matters |
|---|---|
| Mortality prediction | Age acceleration predicts all-cause mortality, independent of traditional risk factors |
| Disease risk | Accelerated aging correlates with cardiovascular disease, cancer, Alzheimer's |
| Intervention testing | The only validated way to measure if an anti-aging intervention is "working" on a molecular level |
| Tissue-specific aging | Different organs age at different rates β clocks can detect this |
| Lifestyle effects | Diet, exercise, stress, and sleep all leave methylation signatures |
| Rejuvenation research | Yamanaka factor reprogramming resets the epigenetic clock (Horvath clock specifically) |
The Horvath clock was the first pan-tissue clock, and it remains the most widely used. Later clocks (Hannum, PhenoAge/Levine, GrimAge, DunedinPACE) optimize for different outcomes, but Horvath's remains the gold standard for estimating intrinsic epigenetic age.
The clock is an elastic net regression:
DNAm_age = intercept + Ξ£(coefficient_i Γ beta_i) for i = 1..353
Where beta_i is the methylation level (0.0 = unmethylated, 1.0 = fully methylated) at each of the 353 selected CpG sites.
Horvath uses a calibration function to handle the non-linear relationship between methylation age and chronological age (aging is faster during development):
Forward: F(age) = log(age + 1) β log(21)
Inverse: age = exp(DNAm_age + log(21)) β 1
The adult age threshold (20) reflects the transition from rapid developmental changes to the slower, more linear aging trajectory of adulthood.
ΞAge = predicted_biological_age β chronological_age
- Positive β accelerated aging (biologically older than expected)
- Negative β decelerated aging (biologically younger than expected)
This tool implements the core Horvath clock framework with a representative subset of 36 CpG sites (the most influential by coefficient magnitude). The full 353-site model coefficients are available from Horvath's supplementary data.
The subset captures the strongest signals in the clock and is sufficient for:
- Understanding how the clock works
- Testing with synthetic or partial data
- Educational demonstrations
For research-grade predictions, load the complete coefficient set.
No installation needed β pure Python 3 standard library.
# Clone or copy the project
cd epi-clock/
# Make executable (optional)
chmod +x epi_clock.pypython3 epi_clock.py demoRuns the clock on three synthetic samples (young adult, middle-aged, elderly) with educational output explaining the biology.
# Basic prediction
python3 epi_clock.py calculate sample.csv
# With chronological age comparison
python3 epi_clock.py calculate sample.csv --age 45
# Verbose mode (show CpG site contributions)
python3 epi_clock.py calculate sample.csv --age 45 -vpython3 epi_clock.py compare young_tissue.csv old_tissue.csv
python3 epi_clock.py compare tumor.csv normal.csv --age 60# Generate CSV files for testing
python3 epi_clock.py export-demo --output-dir ./demo_data/python3 epi_clock.py infoTwo columns: CpG probe ID and beta value.
CpG_site,beta_value
cg22736354,0.2341
cg06493994,0.0812
cg24724428,0.5623
...Standard GEO series matrix layout (rows = probes, columns = samples):
ID_REF GSM123456 GSM123457 GSM123458
cg22736354 0.2341 0.4521 0.6712
cg06493994 0.0812 0.1234 0.2345
cg24724428 0.5623 0.6789 0.7891
...The tool produces:
- Predicted biological age with raw DNAm age
- ASCII bar chart comparing biological vs chronological age
- Age acceleration value with interpretation (if
--ageprovided) - CpG coverage report β how many clock sites were found in your data
- Confidence indicator β based on coverage quality
- Top CpG contributors β which sites most influenced the prediction (verbose mode)
Some notable sites in the Horvath clock:
| CpG | Gene | Role |
|---|---|---|
cg24724428 |
ELOVL2 | Fatty acid elongation β one of the strongest age-correlated CpGs known |
cg22736354 |
NHLRC1 | Largest positive coefficient in the clock |
cg22454769 |
FHL2 | Four-and-a-half LIM domains β robust aging marker |
cg12830694 |
LDB2 | Largest negative coefficient β methylation decreases with age |
cg23606718 |
RPTOR | Regulatory subunit of mTORC1 β connects to nutrient sensing & longevity |
- Subset model: Uses 36 of 353 CpG sites β sufficient for demonstration but not publication-grade
- Synthetic calibration: Demo data is algorithmically generated, not from real methylation arrays
- No normalization: Real methylation data requires BMIQ or similar normalization before clock calculation
- Platform specific: Designed for Illumina 27K/450K/EPIC array probe IDs
- No cell type deconvolution: The full Horvath pipeline includes blood cell type adjustment
To use the complete model:
- Download coefficients from Horvath's website or the R
methylclockpackage - Replace the
HORVATH_CLOCK_SITESdictionary with all 353 entries - The rest of the code (transformation, prediction, display) works unchanged
| Clock | Year | Sites | Optimized For |
|---|---|---|---|
| Horvath | 2013 | 353 | Pan-tissue chronological age |
| Hannum | 2013 | 71 | Blood chronological age |
| PhenoAge | 2018 | 513 | Mortality / phenotypic age |
| GrimAge | 2019 | 1030 | Mortality (smoking, proteins) |
| DunedinPACE | 2022 | 173 | Rate of aging (pace) |
- Horvath, S. (2013). DNA methylation age of human tissues and cell types. Genome Biology, 14(10), R115. DOI: 10.1186/gb-2013-14-10-r115
- Hannum, G., et al. (2013). Genome-wide methylation profiles reveal quantitative views of human aging rates. Molecular Cell, 49(2), 359-367.
- Levine, M.E., et al. (2018). An epigenetic biomarker of aging for lifespan and healthspan. Aging, 10(4), 573-591.
- Lu, A.T., et al. (2019). DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging, 11(2), 303-327.
- Belsky, D.W., et al. (2022). DunedinPACE, a DNA methylation biomarker of the pace of aging. eLife, 11, e73420.
Built as an educational bioinformatics tool. For research applications, use the full 353-site model with properly normalized array data.