APP-356 Convert POTNTL_DUP_INV_SUM (Potential Duplicate Investigations)#3131
APP-356 Convert POTNTL_DUP_INV_SUM (Potential Duplicate Investigations)#3131krista-skylight wants to merge 63 commits into
Conversation
…into kc/convert_POTNTL_DUP_INV_SUM merge branches
…into kc/convert_POTNTL_DUP_INV_SUM merge main
…into kc/convert_POTNTL_DUP_INV_SUM merge from main
…into kc/convert_POTNTL_DUP_INV_SUM merge with main
…into kc/convert_POTNTL_DUP_INV_SUM merge changes from main
…into kc/convert_POTNTL_DUP_INV_SUM merge from main
| - column_name: ILLNESS_ONSET_DATE | ||
| type: string | ||
| data: None | ||
| null_percentage: 1.0 |
There was a problem hiding this comment.
(q, nb): why include the column, but have everything be null?
There was a problem hiding this comment.
I was just trying to mimic everything I saw in the original table as much as possible since I wasn't sure what downstream effects empty columns have on this library or other ones using this table
There was a problem hiding this comment.
for other fake tables, we've been including the columns that are definitely used and/or required, but otherwise leaving things off. I think it's fine to include more, but if a column is included it should have reasonable values, whereas this one is in neither state
| data: f"{fake.date_between(start_date='-1y', end_date='today').strftime('%Y-%m-%d')} 12:00:00.000" | ||
| null_percentage: 0.05 | ||
|
|
||
| - column_name: EVENT_DATE_TYPE |
There was a problem hiding this comment.
should this be linked off of the non-null ness of EVENT_DATE? (similar to how in phdc the state is dependent on state cd)
There was a problem hiding this comment.
oooh great point! I just modified accordingly
There was a problem hiding this comment.
@mcmcgrath13 for some reason with this change and a new snapshot is causing tests to fail in CI but pass locally 😢. Let me know if you have additional feedback. I'm going to work on fixing the CI issue before merging.
|
|
||
| # KLUDGE: NULL writing is not always correct | ||
| result = result.replace(' nan,', ' NULL,') | ||
| result = result.replace('nan', ' NULL') |
There was a problem hiding this comment.
(q, nb): why did we need to add this one? there's a risk that a valid part of a string with nan in it ill now be turned into NULL is it the opening paren case of (nan,?
There was a problem hiding this comment.
I remember adding some of these to get get_faker_sql to not break for my data but now that I'm trying it again without that line, it still does work, so I have removed it!
Co-authored-by: Mary McGrath <m.c.mcgrath13@gmail.com>
…Cgov/NEDSS-Modernization into kc/convert_POTNTL_DUP_INV_SUM pull upstream
…tntl_dup_inv_sum/test_execute_report_with_days_value/snapshot.yml
…s_potntl_dup_inv_sum/test_execute_report_with_days_value/snapshot.yml
|
❌ The last analysis has failed. |
|
| - column_name: EVENT_DATE | ||
| type: string | ||
| data: f"{fake.date_between(start_date='-1y', end_date='today').strftime('%Y-%m-%d')} 12:00:00.000" | ||
| null_percentage: 0.05 | ||
|
|
||
| - column_name: EVENT_DATE_TYPE | ||
| type: string | ||
| data: | | ||
| ( | ||
| lambda ed: None if ed is None else random.choice([ | ||
| "Investigation Start Date", | ||
| "Date of Report", | ||
| "Specimen Collection Date of Earliest Associated Lab", | ||
| "Illness Onset Date", | ||
| "Date of Diagnosis" | ||
| ]) | ||
| )(EVENT_DATE) | ||
| null_percentage: 0 |
There was a problem hiding this comment.
table faker nullifies after generating the data, so for this to do what you want, you need to not use null_percentage in the EVENT_DATE spec and instead do something like blah if random.random() < .05 else blah
|
Looking at the comparison test files, the date formatting looks very different - is this expected? |




Description
Please include a summary of the changes and any key information a reviewer may need.
Tickets
Checklist for adding a library:
apps/modernization-api/src/main/resources/db/changelog/report/execution/03_ODSE_Data_Report_Library_Init.sqlapps/modernization-api/src/main/resources/db/report/execution/librariesnamed<your library name>.sqlapps/modernization-api/src/main/resources/db/report-execution-changelog.yml. It should be added to the latestchangeSetsince the last release - this could require a newchangeSetif there isn't one since last releaseapps/report-execution/src/librariesnamed in lowercase, but generally following the naming convention of SAS (needs to be human recognizable as the same library)executefunction is the required method and its signature will (someday) be checked for validityapps/report-execution/tests/libraries/<your_library_here>pyfile following the conventions established for other librariessubset_sqlcan be assumed to be well tested by the modernization-api, so focus on any logic and additional joins/queries/analysis that is added in the library