Skip to content

darwin-eu/PregnancyIdentifier

Repository files navigation

PregnancyIdentifier

Tests (PostgreSQL) Tests (Snowflake) Tests (Spark) Tests (SQLServer) Codecov test coverage

Identify pregnancy episodes in OMOP CDM data using the HIPPS algorithm (Smith et al. 2024, doi:10.1093/jamia/ocae195).


Observational health data rarely has pregnancy_start or pregnancy_end variables. More often we get scattered pregnancy-related events such as live birth, gestational week 12, delivery procedure, miscarriage, etc. PregnancyIdentifier turns pregnancy-related codes into:

  • One row per pregnancy episode
  • Inferred start and end dates (and precision) from gestational timing evidence.
  • Standard outcome categories (LB, SB, AB, SA, ECT, DELIV, PREG) you can use in analyses or exports.

The pipeline combines outcome-anchored episodes (HIP), timing-anchored episodes (PPS), merges them (HIPPS), then refines start dates (ESD)—so you get a consistent definition of a pregnancy across sites and data sources.


How to use it

Install (requires R ≥ 4.1 and CDMConnector):

# From GitHub (DARWIN EU)
remotes::install_github("darwin-eu/PregnancyIdentifier")

Run the full pipeline (initializes concepts, runs HIP → PPS → merge → ESD, writes outputs):

library(PregnancyIdentifier)
library(CDMConnector)

cdm <- mockPregnancyCdm()  # or your real cdm_reference

runPregnancyIdentifier(
  cdm       = cdm,
  outputFolder = "pregnancy_output",
  startDate = as.Date("2000-01-01"),
  endDate   = Sys.Date()
)

Use the result:
pregnancy_output/final_pregnancy_episodes.rds is a data frame with one row per pregnancy episode: person_id, final_episode_start_date, final_episode_end_date, final_outcome_category, esd_precision_days, and other esd_* QA/concordance columns. Shareable aggregated CSVs are written to pregnancy_output/export by default (override with exportFolder). Use exportPregnancies() separately if you need to re-export to a different folder.


Documentation


License

Apache 2.0.