Skip to content

APP-356 Convert POTNTL_DUP_INV_SUM (Potential Duplicate Investigations)#3131

Open
krista-skylight wants to merge 63 commits into
mainfrom
kc/convert_POTNTL_DUP_INV_SUM
Open

APP-356 Convert POTNTL_DUP_INV_SUM (Potential Duplicate Investigations)#3131
krista-skylight wants to merge 63 commits into
mainfrom
kc/convert_POTNTL_DUP_INV_SUM

Conversation

@krista-skylight
Copy link
Copy Markdown
Contributor

@krista-skylight krista-skylight commented Apr 6, 2026

Description

Please include a summary of the changes and any key information a reviewer may need.

Tickets

Checklist for adding a library:

  • Copy this checklist into your (draft) PR description
  • Check the existing description for the library in apps/modernization-api/src/main/resources/db/changelog/report/execution/03_ODSE_Data_Report_Library_Init.sql
    • If the report execution service is not yet in use by any STLT (even beta), you should update this file
    • Update the description in the library-specific migration
  • Add a migration to apps/modernization-api/src/main/resources/db/report/execution/libraries named <your library name>.sql
    • Copy another migration and update the variables at the top of the file
  • Add the migration to apps/modernization-api/src/main/resources/db/report-execution-changelog.yml. It should be added to the latest changeSet since the last release - this could require a new changeSet if there isn't one since last release
  • Check the migration works by building/running the containerized dev setup
  • Add a python library file to apps/report-execution/src/libraries named in lowercase, but generally following the naming convention of SAS (needs to be human recognizable as the same library)
  • Follow the pattern established in existing libraries to create your library. The execute function is the required method and its signature will (someday) be checked for validity
    • Make sure pertinent text embedded in the reports is included in the description of the report
    • Note in the doc string any deviation from the SAS
  • Keep an eye out for things in the sas library that are/could be centralized to pre- or post-processing or a util would be helpful for (e.g. common formatting needs)
  • Add integration tests in a apps/report-execution/tests/libraries/<your_library_here>py file following the conventions established for other libraries
    • The subset_sql can be assumed to be well tested by the modernization-api, so focus on any logic and additional joins/queries/analysis that is added in the library
    • The database used in integration testing is our standard golden db + seed used in other testing - if additional data is needed to test the report, add it to the seeding process
  • Add e2e test coverage for the report
    • If this step is not set up, create a ticket to track this as follow on work needed for this report
  • Check that the report has parity with the NBS 6 version using (process TBD)
    • If this step is not set up, create a ticket to track this as follow on work needed for this report
  • Keep an eye out for opportunities to:
    • Consolidate reports by using parameters (parameter design tbd)
    • Improve the report spec or result expectations
  • Update the SAS to Python tracking spreadsheet with information about the newly converted library
  • Update/clarify these instructions in the epic based on your experience

Krista Chan and others added 30 commits April 1, 2026 13:44
…into kc/convert_POTNTL_DUP_INV_SUM

merge branches
…into kc/convert_POTNTL_DUP_INV_SUM

merge main
…into kc/convert_POTNTL_DUP_INV_SUM

merge from main
…into kc/convert_POTNTL_DUP_INV_SUM

merge with main
…into kc/convert_POTNTL_DUP_INV_SUM

merge changes from main
…into kc/convert_POTNTL_DUP_INV_SUM

merge from main
Copy link
Copy Markdown
Contributor

@JordanGuinn JordanGuinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

- column_name: ILLNESS_ONSET_DATE
type: string
data: None
null_percentage: 1.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(q, nb): why include the column, but have everything be null?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just trying to mimic everything I saw in the original table as much as possible since I wasn't sure what downstream effects empty columns have on this library or other ones using this table

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for other fake tables, we've been including the columns that are definitely used and/or required, but otherwise leaving things off. I think it's fine to include more, but if a column is included it should have reasonable values, whereas this one is in neither state

data: f"{fake.date_between(start_date='-1y', end_date='today').strftime('%Y-%m-%d')} 12:00:00.000"
null_percentage: 0.05

- column_name: EVENT_DATE_TYPE
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be linked off of the non-null ness of EVENT_DATE? (similar to how in phdc the state is dependent on state cd)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooh great point! I just modified accordingly

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcmcgrath13 for some reason with this change and a new snapshot is causing tests to fail in CI but pass locally 😢. Let me know if you have additional feedback. I'm going to work on fixing the CI issue before merging.

Comment thread apps/report-execution/tests/conftest.py Outdated

# KLUDGE: NULL writing is not always correct
result = result.replace(' nan,', ' NULL,')
result = result.replace('nan', ' NULL')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(q, nb): why did we need to add this one? there's a risk that a valid part of a string with nan in it ill now be turned into NULL is it the opening paren case of (nan,?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember adding some of these to get get_faker_sql to not break for my data but now that I'm trying it again without that line, it still does work, so I have removed it!

Comment thread apps/report-execution/tests/conftest.py Outdated
@mcmcgrath13
Copy link
Copy Markdown
Contributor

I only see the SAS output and not the python in the report catalog - can you update with the python?

Also, could you update the spreadsheet tracker and add the e2e ticket?

image

Krista Chan and others added 7 commits May 7, 2026 14:52
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 7, 2026

❌ The last analysis has failed.

See analysis details on SonarQube Cloud

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 7, 2026

Comment on lines +168 to +185
- column_name: EVENT_DATE
type: string
data: f"{fake.date_between(start_date='-1y', end_date='today').strftime('%Y-%m-%d')} 12:00:00.000"
null_percentage: 0.05

- column_name: EVENT_DATE_TYPE
type: string
data: |
(
lambda ed: None if ed is None else random.choice([
"Investigation Start Date",
"Date of Report",
"Specimen Collection Date of Earliest Associated Lab",
"Illness Onset Date",
"Date of Diagnosis"
])
)(EVENT_DATE)
null_percentage: 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table faker nullifies after generating the data, so for this to do what you want, you need to not use null_percentage in the EVENT_DATE spec and instead do something like blah if random.random() < .05 else blah

@mcmcgrath13
Copy link
Copy Markdown
Contributor

Looking at the comparison test files, the date formatting looks very different - is this expected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants