Skip to content

QclawQ/epi-clock

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 Horvath Pan-Tissue Epigenetic Clock

A pure-Python implementation of the Horvath 2013 pan-tissue DNA methylation age predictor. Calculate biological age from CpG beta values β€” no dependencies required.


What Is an Epigenetic Clock?

Your DNA sequence stays (mostly) the same throughout life, but how it's read changes dramatically. DNA methylation β€” the addition of methyl groups to cytosine bases at CpG dinucleotides β€” is one of the most studied epigenetic modifications. It acts as a molecular switch: methylated promoters are typically silenced, while unmethylated ones are active.

In 2013, Steve Horvath made a landmark discovery: methylation levels at 353 specific CpG sites change so predictably with age that they can be used as a biological clock. This wasn't just measuring one gene β€” it was a multi-tissue model trained on 8,000+ samples across 51 tissue and cell types, from brain to blood to liver.

The result: a predictor that estimates biological age with a median error of ~3.6 years across tissues. Crucially, biological age β‰  chronological age. The difference between them β€” age acceleration β€” captures something real about health and mortality risk.

Why Does This Matter?

Epigenetic clocks have become one of the most important biomarkers in aging research:

Application Why It Matters
Mortality prediction Age acceleration predicts all-cause mortality, independent of traditional risk factors
Disease risk Accelerated aging correlates with cardiovascular disease, cancer, Alzheimer's
Intervention testing The only validated way to measure if an anti-aging intervention is "working" on a molecular level
Tissue-specific aging Different organs age at different rates β€” clocks can detect this
Lifestyle effects Diet, exercise, stress, and sleep all leave methylation signatures
Rejuvenation research Yamanaka factor reprogramming resets the epigenetic clock (Horvath clock specifically)

The Horvath clock was the first pan-tissue clock, and it remains the most widely used. Later clocks (Hannum, PhenoAge/Levine, GrimAge, DunedinPACE) optimize for different outcomes, but Horvath's remains the gold standard for estimating intrinsic epigenetic age.

The Math

Linear Model

The clock is an elastic net regression:

DNAm_age = intercept + Ξ£(coefficient_i Γ— beta_i)   for i = 1..353

Where beta_i is the methylation level (0.0 = unmethylated, 1.0 = fully methylated) at each of the 353 selected CpG sites.

Age Transformation

Horvath uses a calibration function to handle the non-linear relationship between methylation age and chronological age (aging is faster during development):

Forward:   F(age) = log(age + 1) βˆ’ log(21)
Inverse:   age = exp(DNAm_age + log(21)) βˆ’ 1

The adult age threshold (20) reflects the transition from rapid developmental changes to the slower, more linear aging trajectory of adulthood.

Age Acceleration

Ξ”Age = predicted_biological_age βˆ’ chronological_age
  • Positive β†’ accelerated aging (biologically older than expected)
  • Negative β†’ decelerated aging (biologically younger than expected)

This Implementation

This tool implements the core Horvath clock framework with a representative subset of 36 CpG sites (the most influential by coefficient magnitude). The full 353-site model coefficients are available from Horvath's supplementary data.

The subset captures the strongest signals in the clock and is sufficient for:

  • Understanding how the clock works
  • Testing with synthetic or partial data
  • Educational demonstrations

For research-grade predictions, load the complete coefficient set.

Installation

No installation needed β€” pure Python 3 standard library.

# Clone or copy the project
cd epi-clock/

# Make executable (optional)
chmod +x epi_clock.py

Usage

Run the Demo

python3 epi_clock.py demo

Runs the clock on three synthetic samples (young adult, middle-aged, elderly) with educational output explaining the biology.

Calculate Biological Age

# Basic prediction
python3 epi_clock.py calculate sample.csv

# With chronological age comparison
python3 epi_clock.py calculate sample.csv --age 45

# Verbose mode (show CpG site contributions)
python3 epi_clock.py calculate sample.csv --age 45 -v

Compare Two Samples

python3 epi_clock.py compare young_tissue.csv old_tissue.csv
python3 epi_clock.py compare tumor.csv normal.csv --age 60

Export Demo Data

# Generate CSV files for testing
python3 epi_clock.py export-demo --output-dir ./demo_data/

View Clock Information

python3 epi_clock.py info

Input Formats

Simple CSV/TSV

Two columns: CpG probe ID and beta value.

CpG_site,beta_value
cg22736354,0.2341
cg06493994,0.0812
cg24724428,0.5623
...

GEO Matrix Format

Standard GEO series matrix layout (rows = probes, columns = samples):

ID_REF	GSM123456	GSM123457	GSM123458
cg22736354	0.2341	0.4521	0.6712
cg06493994	0.0812	0.1234	0.2345
cg24724428	0.5623	0.6789	0.7891
...

Output

The tool produces:

  • Predicted biological age with raw DNAm age
  • ASCII bar chart comparing biological vs chronological age
  • Age acceleration value with interpretation (if --age provided)
  • CpG coverage report β€” how many clock sites were found in your data
  • Confidence indicator β€” based on coverage quality
  • Top CpG contributors β€” which sites most influenced the prediction (verbose mode)

Key CpG Sites

Some notable sites in the Horvath clock:

CpG Gene Role
cg24724428 ELOVL2 Fatty acid elongation β€” one of the strongest age-correlated CpGs known
cg22736354 NHLRC1 Largest positive coefficient in the clock
cg22454769 FHL2 Four-and-a-half LIM domains β€” robust aging marker
cg12830694 LDB2 Largest negative coefficient β€” methylation decreases with age
cg23606718 RPTOR Regulatory subunit of mTORC1 β€” connects to nutrient sensing & longevity

Limitations

  1. Subset model: Uses 36 of 353 CpG sites β€” sufficient for demonstration but not publication-grade
  2. Synthetic calibration: Demo data is algorithmically generated, not from real methylation arrays
  3. No normalization: Real methylation data requires BMIQ or similar normalization before clock calculation
  4. Platform specific: Designed for Illumina 27K/450K/EPIC array probe IDs
  5. No cell type deconvolution: The full Horvath pipeline includes blood cell type adjustment

Extending to Full 353-Site Clock

To use the complete model:

  1. Download coefficients from Horvath's website or the R methylclock package
  2. Replace the HORVATH_CLOCK_SITES dictionary with all 353 entries
  3. The rest of the code (transformation, prediction, display) works unchanged

Related Clocks

Clock Year Sites Optimized For
Horvath 2013 353 Pan-tissue chronological age
Hannum 2013 71 Blood chronological age
PhenoAge 2018 513 Mortality / phenotypic age
GrimAge 2019 1030 Mortality (smoking, proteins)
DunedinPACE 2022 173 Rate of aging (pace)

References

  • Horvath, S. (2013). DNA methylation age of human tissues and cell types. Genome Biology, 14(10), R115. DOI: 10.1186/gb-2013-14-10-r115
  • Hannum, G., et al. (2013). Genome-wide methylation profiles reveal quantitative views of human aging rates. Molecular Cell, 49(2), 359-367.
  • Levine, M.E., et al. (2018). An epigenetic biomarker of aging for lifespan and healthspan. Aging, 10(4), 573-591.
  • Lu, A.T., et al. (2019). DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging, 11(2), 303-327.
  • Belsky, D.W., et al. (2022). DunedinPACE, a DNA methylation biomarker of the pace of aging. eLife, 11, e73420.

Built as an educational bioinformatics tool. For research applications, use the full 353-site model with properly normalized array data.

About

🧬 Horvath Pan-Tissue Epigenetic Clock β€” predict biological age from DNA methylation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors