APP-356 Convert POTNTL_DUP_INV_SUM (Potential Duplicate Investigations) by krista-skylight · Pull Request #3131 · CDCgov/NEDSS-Modernization

krista-skylight · 2026-04-06T22:54:29Z

Description

Please include a summary of the changes and any key information a reviewer may need.

Tickets

Jira Ticket

Checklist for adding a library:

…into kc/convert_POTNTL_DUP_INV_SUM merge branches

…into kc/convert_POTNTL_DUP_INV_SUM merge main

…into kc/convert_POTNTL_DUP_INV_SUM merge from main

…in report spec

…into kc/convert_POTNTL_DUP_INV_SUM merge with main

…into kc/convert_POTNTL_DUP_INV_SUM merge changes from main

…into kc/convert_POTNTL_DUP_INV_SUM merge from main

…into kc/convert_POTNTL_DUP_INV_SUM merge main

JordanGuinn

🚀

mcmcgrath13 · 2026-05-07T00:18:09Z

+      - column_name: ILLNESS_ONSET_DATE
+        type: string
+        data: None
+        null_percentage: 1.0


(q, nb): why include the column, but have everything be null?

I was just trying to mimic everything I saw in the original table as much as possible since I wasn't sure what downstream effects empty columns have on this library or other ones using this table

for other fake tables, we've been including the columns that are definitely used and/or required, but otherwise leaving things off. I think it's fine to include more, but if a column is included it should have reasonable values, whereas this one is in neither state

mcmcgrath13 · 2026-05-07T00:18:55Z

+        data: f"{fake.date_between(start_date='-1y', end_date='today').strftime('%Y-%m-%d')} 12:00:00.000"
+        null_percentage: 0.05
+
+      - column_name: EVENT_DATE_TYPE


should this be linked off of the non-null ness of EVENT_DATE? (similar to how in phdc the state is dependent on state cd)

oooh great point! I just modified accordingly

@mcmcgrath13 for some reason with this change and a new snapshot is causing tests to fail in CI but pass locally 😢. Let me know if you have additional feedback. I'm going to work on fixing the CI issue before merging.

mcmcgrath13 · 2026-05-07T00:22:25Z


    # KLUDGE: NULL writing is not always correct
    result = result.replace(' nan,', ' NULL,')
+    result = result.replace('nan', ' NULL')


(q, nb): why did we need to add this one? there's a risk that a valid part of a string with nan in it ill now be turned into NULL is it the opening paren case of (nan,?

I remember adding some of these to get get_faker_sql to not break for my data but now that I'm trying it again without that line, it still does work, so I have removed it!

mcmcgrath13 · 2026-05-07T00:24:34Z

I only see the SAS output and not the python in the report catalog - can you update with the python?

Also, could you update the spreadsheet tracker and add the e2e ticket?

Co-authored-by: Mary McGrath <m.c.mcgrath13@gmail.com>

…Cgov/NEDSS-Modernization into kc/convert_POTNTL_DUP_INV_SUM pull upstream

…tntl_dup_inv_sum/test_execute_report_with_days_value/snapshot.yml

…s_potntl_dup_inv_sum/test_execute_report_with_days_value/snapshot.yml

sonarqubecloud · 2026-05-07T22:39:52Z

❌ The last analysis has failed.

See analysis details on SonarQube Cloud

sonarqubecloud · 2026-05-07T23:44:51Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

mcmcgrath13 · 2026-05-11T13:52:27Z

+      - column_name: EVENT_DATE
+        type: string
+        data: f"{fake.date_between(start_date='-1y', end_date='today').strftime('%Y-%m-%d')} 12:00:00.000"
+        null_percentage: 0.05
+
+      - column_name: EVENT_DATE_TYPE
+        type: string
+        data: |
+          (
+            lambda ed: None if ed is None else random.choice([
+              "Investigation Start Date", 
+              "Date of Report", 
+              "Specimen Collection Date of Earliest Associated Lab", 
+              "Illness Onset Date", 
+              "Date of Diagnosis"
+            ])
+          )(EVENT_DATE)
+        null_percentage: 0


table faker nullifies after generating the data, so for this to do what you want, you need to not use null_percentage in the EVENT_DATE spec and instead do something like blah if random.random() < .05 else blah

mcmcgrath13 · 2026-05-11T13:57:26Z

Looking at the comparison test files, the date formatting looks very different - is this expected?

Krista Chan and others added 30 commits April 1, 2026 13:44

chore: initial copies of new library modules

60e01c0

Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …

4970f48

…into kc/convert_POTNTL_DUP_INV_SUM merge branches

feat: tablefaker schema for dup_inv_sum and initial conversion and tests

a8399b9

chore: add changelog and sql file for potntl dup inv

a62cbfe

chore: merge main branch

c73d401

Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …

e7ad140

…into kc/convert_POTNTL_DUP_INV_SUM merge main

chore: switch to rdb database

fdce598

chore: more nan handling and add INVESTIGATION_KEY

fae8225

chore: new snapshot

fad7088

chore: changes from main

ba5b29b

chore: try getting rid of datetime to pass tests

b01e674

Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …

a7a9b99

…into kc/convert_POTNTL_DUP_INV_SUM merge from main

tests: data type fixes and typo fixes to make tests run

1aabe0d

chore: fixes to subheader, add all needed columns, handle days_value …

44e6f1c

…in report spec

tests: fix 30 day assertions. switch to triple quotes

0679b42

chore: if no fktables

09ffc64

tests: rewrite without the extra fields

3fab55c

chore: change fk_table logic and go back to 3650 days default

ecd93e6

chore: pull main merge changes

501788f

Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …

f6610c0

…into kc/convert_POTNTL_DUP_INV_SUM merge with main

chore: changes to get sas to work

f2fd4bb

chore: rework without TimeRange

3349cd9

chore: rename files

22764c7

chore: more renaming

f64f9a9

tests: tablefaker schema matches actual data

e375a87

chore: date formats that actually work

e6359e9

chore: change migration filenames

ca04489

chore: bring in changes from main

7f3f99d

Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …

d239bc3

…into kc/convert_POTNTL_DUP_INV_SUM merge changes from main

Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …

9ba225d

…into kc/convert_POTNTL_DUP_INV_SUM merge from main

Krista Chan added 5 commits May 6, 2026 12:10

Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …

86f46a3

…into kc/convert_POTNTL_DUP_INV_SUM merge main

tests: add a negative days value test

6bb51ec

tests: remove small days value test

3a0e74d

tests: remove disease filter test, change to >

955a6b8

chore: linter fixes

756f107

krista-skylight requested review from JordanGuinn and mcmcgrath13 May 6, 2026 19:31

JordanGuinn approved these changes May 6, 2026

View reviewed changes

mcmcgrath13 reviewed May 7, 2026

View reviewed changes

Comment thread apps/report-execution/tests/conftest.py Outdated

Krista Chan and others added 7 commits May 7, 2026 14:52

chore: make event date type null if event date is null

e4250fc

Update apps/report-execution/tests/conftest.py

6a8c6e2

Co-authored-by: Mary McGrath <m.c.mcgrath13@gmail.com>

chore: remove 'nan' null replacement

922d9a2

Merge branch 'kc/convert_POTNTL_DUP_INV_SUM' of https://github.com/CD…

aba9306

…Cgov/NEDSS-Modernization into kc/convert_POTNTL_DUP_INV_SUM pull upstream

Delete apps/report-execution/tests/integration/libraries/snapshots/po…

d759414

…tntl_dup_inv_sum/test_execute_report_with_days_value/snapshot.yml

Delete apps/report-execution/tests/integration/libraries/snapshots/nb…

158a8aa

…s_potntl_dup_inv_sum/test_execute_report_with_days_value/snapshot.yml

chore: reload snapshot

f29a95c

Krista Chan added 6 commits May 7, 2026 15:42

tests: run snapshot update in ci

88a743c

chore: undo snapshot update

d0da3d3

tests: change to test execute report check data

46bc66e

try again with snapshot update

d76d0df

upload shapshot as artifact

cb25633

undo ci changes

d4cb61f

mcmcgrath13 reviewed May 11, 2026

View reviewed changes

Conversation

krista-skylight commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tickets

Uh oh!

JordanGuinn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mcmcgrath13 commented May 7, 2026

Uh oh!

sonarqubecloud Bot commented May 7, 2026

Uh oh!

sonarqubecloud Bot commented May 7, 2026

Quality Gate passed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcmcgrath13 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

krista-skylight commented Apr 6, 2026 •

edited

Loading