Skip to content

Update as of April 6th#4

Open
larryvaldes wants to merge 21 commits into
masterfrom
development
Open

Update as of April 6th#4
larryvaldes wants to merge 21 commits into
masterfrom
development

Conversation

@larryvaldes

Copy link
Copy Markdown
Owner

No description provided.

Lazaro Valdes and others added 21 commits February 21, 2025 22:18
… and Drug_exposure.

Unnesting for Procedure.
Needs to work on Medication to improve coding mappings
- Rewrite lot_base.sqlx to source from ANISH.linesOfTherapy (replaces
  HT.linesOfTherapy); exposes lineOfTherapyId, diseaseId, listOfProcedures,
  stemCellProcedureDates, and outcome for downstream use
- Update episode_lines_parent.sqlx to map on diseaseId (ANISH GUID) instead
  of diseaseName
- Rewrite episode_events_child.sqlx to derive Cancer Surgery (32939) episodes
  by unnesting listOfProcedures STRUCT from ANISH; joins SNOMED codes to OMOP
  concept for episode_object_concept_id
- Update episode.sqlx to produce all four CERTAINTY episode types: Cancer Drug
  Treatment (32941), Disease Episode (32533) from condition_occurrence, Cancer
  Surgery (32939), and Progression (32949) from artOutcome; resolves
  episode_parent_id for surgery and progression episodes
- Add episode_event.sqlx linking all episode types to clinical events per
  CERTAINTY §3.2.11 (drug_exposure, procedure_occurrence, measurement,
  observation)
- Add OMOP_CDM_54_Reference.md with full CDM v5.4 table specs and conventions
- Add CERTAINTY_OMOP_codebook_v2.1.docx as project guideline reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Old episode pipeline (fully replaced by newEpisode/):
- episode/episodeOld.sqlx
- episode/episodeWIP.sqlx
- episode/episode_event.sqlx (old version)
- episode/episode_idx.sqlx
- episode/episode_edge.sqlx
- episode/episode_seeds.sqlx
- episode/lot_baseOld.sqlx
- episode/lot_plain.sqlx
- episode/lot_mapped.sqlx
- idx/medReq_idx.sqlx

Unused drug exposure variants (zero refs):
- Drug_exposure/new_drug_exposure.sqlx
- Drug_exposure/new_drug_exposure2.sqlx
- Drug_exposure/drug_exposure_sections.sqlx
- Drug_exposure/norm_drug_exposure.sqlx

Deprecated/shadowed:
- Procedure/deprec_procedure_occurrence.sqlx
- episode/artificialProcedure/artProcMap.sqlx

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
condition_occurrence.condition_start_datetime is DATETIME; cast to
TIMESTAMP to match the other three branches of the UNION ALL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Flip ID assignment order (Disease → LoT → Surgery → Progression)
  so each child type can resolve episode_parent_id in the same pass
- LoT (32941) now gets episode_parent_id → Disease episode (32533)
  for the same patient via LEFT JOIN disease_with_id USING (person_id)
- Fix progression filter: outcome values in ANISH are SNAKE_CASE enums
  (PROGRESSIVE_DISEASE), not natural language; replace regex with an
  IN ('PROGRESSIVE_DISEASE', 'PROGRESSIVE DISEASE') check

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The id (Firestore doc ID) field is NULL for 99.7% of records in
ANISH.linesOfTherapy (only 15 of 5168 rows are non-null), causing
lot_base to return only 15 records. lineOfTherapyId is unique and
non-null across all 5168 records and is the correct primary key.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously only MM was covered (3 SNOMED variants). Now creates one
Disease episode (32533) per patient per disease across all 6 types:
MM (437233), MDS (140352), AML (4147411), CML (133169), NHL (138994),
CLL (432574). MM variants (4047185, 4176952) are normalised to 437233.

LoT→Disease parent join updated to match on both person_id and
episode_object_concept_id so each LoT links to the correct disease.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resets dqdashboard_results table then runs all OHDSI Data Quality
Dashboard SQL checks in order against healthtree-production.dataform.
Logs pass/fail per file and prints summary query at the end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When endDate < startDate in linesOfTherapy (up to 2223 days inverted),
set episode_end_date/datetime to NULL rather than propagating the invalid
date. Preserves all records with start date intact; treats the LoT as
ongoing when the end date is demonstrably wrong. Same guard applied to
surgery (episode_events_child) which inherits lot_end.

Fixes EPISODE plausibleStartBeforeEnd DQD check (200 rows, 4.1%).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previous implementation only sourced dates from the observation table,
missing all other clinical activity. Also lacked death/birth guards,
causing plausibleDuringLife DQD failures (56% violation rate on end date).

New approach:
- Sources min/max dates across all 10 event tables
- Floors start date at birth date to prevent pre-birth periods
- Caps end date at death date to fix plausibleDuringLife violations
- Uses stable ROW_NUMBER() for observation_period_id

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents three patients with likely incorrect death dates that account
for 98% of plausibleBeforeDeath violations, plus summaries of the two
ETL fixes already applied (inverted episode dates, observation period).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BigQuery infers bare `null` as INT64, causing INSERT failures into
dqdashboard_results columns not_applicable_reason and notes_value which
are declared as STRING. Pipe each SQL file through sed to apply explicit
CAST(null AS STRING) before passing to bq query.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Complete set of dbt SQL models and YAML schema files for OMOP CDM v5.4,
implementing a cleaner three-tier architecture (stg → int → final) with
cross-domain eviction, incremental-safe provider FK handling, multi-stream
unions per domain, and comprehensive column-level schema tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MERGE_PLAN.md: full comparative analysis of the existing Dataform pipeline
vs _newApproach, including architecture summaries, reinforcement inventory,
and a phased 42-item checklist for merging both without losing any logic.

ToDo.md: 45 Linear-ready tasks across 10 epics covering the full
Dataform-to-dbt migration, with dependency order noted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… and CLAUDE.md

- definitions/Additional/cdm_source.sqlx: new required OMOP CDM table with HealthTree metadata
- dataQualityTests/run_cdm_inspection.sh: bq-CLI replication of EHDEN CdmInspection (8 sections)
- dataQualityTests/run_cdm_inspection_r.r: R/JDBC script fixed (OAuthType=3 + EnableSession=1 for BigQuery)
- dataQualityTests/run_all_checks.sh: master script running DQD + CDM Inspection
- dataQualityTests/cdm_inspection_notes.md: setup notes + resolution of R/JDBC transaction error
- clinical_review/: 10 SQL audit queries (events before birth, post-death records, domain violations, etc.)
- CLAUDE.md: codebase guide for Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant