Update as of April 6th#4
Open
larryvaldes wants to merge 21 commits into
Open
Conversation
… and Drug_exposure. Unnesting for Procedure. Needs to work on Medication to improve coding mappings
- Rewrite lot_base.sqlx to source from ANISH.linesOfTherapy (replaces HT.linesOfTherapy); exposes lineOfTherapyId, diseaseId, listOfProcedures, stemCellProcedureDates, and outcome for downstream use - Update episode_lines_parent.sqlx to map on diseaseId (ANISH GUID) instead of diseaseName - Rewrite episode_events_child.sqlx to derive Cancer Surgery (32939) episodes by unnesting listOfProcedures STRUCT from ANISH; joins SNOMED codes to OMOP concept for episode_object_concept_id - Update episode.sqlx to produce all four CERTAINTY episode types: Cancer Drug Treatment (32941), Disease Episode (32533) from condition_occurrence, Cancer Surgery (32939), and Progression (32949) from artOutcome; resolves episode_parent_id for surgery and progression episodes - Add episode_event.sqlx linking all episode types to clinical events per CERTAINTY §3.2.11 (drug_exposure, procedure_occurrence, measurement, observation) - Add OMOP_CDM_54_Reference.md with full CDM v5.4 table specs and conventions - Add CERTAINTY_OMOP_codebook_v2.1.docx as project guideline reference Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Old episode pipeline (fully replaced by newEpisode/): - episode/episodeOld.sqlx - episode/episodeWIP.sqlx - episode/episode_event.sqlx (old version) - episode/episode_idx.sqlx - episode/episode_edge.sqlx - episode/episode_seeds.sqlx - episode/lot_baseOld.sqlx - episode/lot_plain.sqlx - episode/lot_mapped.sqlx - idx/medReq_idx.sqlx Unused drug exposure variants (zero refs): - Drug_exposure/new_drug_exposure.sqlx - Drug_exposure/new_drug_exposure2.sqlx - Drug_exposure/drug_exposure_sections.sqlx - Drug_exposure/norm_drug_exposure.sqlx Deprecated/shadowed: - Procedure/deprec_procedure_occurrence.sqlx - episode/artificialProcedure/artProcMap.sqlx Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
condition_occurrence.condition_start_datetime is DATETIME; cast to TIMESTAMP to match the other three branches of the UNION ALL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Flip ID assignment order (Disease → LoT → Surgery → Progression)
so each child type can resolve episode_parent_id in the same pass
- LoT (32941) now gets episode_parent_id → Disease episode (32533)
for the same patient via LEFT JOIN disease_with_id USING (person_id)
- Fix progression filter: outcome values in ANISH are SNAKE_CASE enums
(PROGRESSIVE_DISEASE), not natural language; replace regex with an
IN ('PROGRESSIVE_DISEASE', 'PROGRESSIVE DISEASE') check
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The id (Firestore doc ID) field is NULL for 99.7% of records in ANISH.linesOfTherapy (only 15 of 5168 rows are non-null), causing lot_base to return only 15 records. lineOfTherapyId is unique and non-null across all 5168 records and is the correct primary key. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously only MM was covered (3 SNOMED variants). Now creates one Disease episode (32533) per patient per disease across all 6 types: MM (437233), MDS (140352), AML (4147411), CML (133169), NHL (138994), CLL (432574). MM variants (4047185, 4176952) are normalised to 437233. LoT→Disease parent join updated to match on both person_id and episode_object_concept_id so each LoT links to the correct disease. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resets dqdashboard_results table then runs all OHDSI Data Quality Dashboard SQL checks in order against healthtree-production.dataform. Logs pass/fail per file and prints summary query at the end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When endDate < startDate in linesOfTherapy (up to 2223 days inverted), set episode_end_date/datetime to NULL rather than propagating the invalid date. Preserves all records with start date intact; treats the LoT as ongoing when the end date is demonstrably wrong. Same guard applied to surgery (episode_events_child) which inherits lot_end. Fixes EPISODE plausibleStartBeforeEnd DQD check (200 rows, 4.1%). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previous implementation only sourced dates from the observation table, missing all other clinical activity. Also lacked death/birth guards, causing plausibleDuringLife DQD failures (56% violation rate on end date). New approach: - Sources min/max dates across all 10 event tables - Floors start date at birth date to prevent pre-birth periods - Caps end date at death date to fix plausibleDuringLife violations - Uses stable ROW_NUMBER() for observation_period_id Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents three patients with likely incorrect death dates that account for 98% of plausibleBeforeDeath violations, plus summaries of the two ETL fixes already applied (inverted episode dates, observation period). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BigQuery infers bare `null` as INT64, causing INSERT failures into dqdashboard_results columns not_applicable_reason and notes_value which are declared as STRING. Pipe each SQL file through sed to apply explicit CAST(null AS STRING) before passing to bq query. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Complete set of dbt SQL models and YAML schema files for OMOP CDM v5.4, implementing a cleaner three-tier architecture (stg → int → final) with cross-domain eviction, incremental-safe provider FK handling, multi-stream unions per domain, and comprehensive column-level schema tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MERGE_PLAN.md: full comparative analysis of the existing Dataform pipeline vs _newApproach, including architecture summaries, reinforcement inventory, and a phased 42-item checklist for merging both without losing any logic. ToDo.md: 45 Linear-ready tasks across 10 epics covering the full Dataform-to-dbt migration, with dependency order noted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… and CLAUDE.md - definitions/Additional/cdm_source.sqlx: new required OMOP CDM table with HealthTree metadata - dataQualityTests/run_cdm_inspection.sh: bq-CLI replication of EHDEN CdmInspection (8 sections) - dataQualityTests/run_cdm_inspection_r.r: R/JDBC script fixed (OAuthType=3 + EnableSession=1 for BigQuery) - dataQualityTests/run_all_checks.sh: master script running DQD + CDM Inspection - dataQualityTests/cdm_inspection_notes.md: setup notes + resolution of R/JDBC transaction error - clinical_review/: 10 SQL audit queries (events before birth, post-death records, domain violations, etc.) - CLAUDE.md: codebase guide for Claude Code Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.