Skip to content

Create functionality to extract LOINC Delta updates to generate additional embeddings#532

Open
BradySkylight wants to merge 13 commits into
mainfrom
brady/452-get-loinc-delta
Open

Create functionality to extract LOINC Delta updates to generate additional embeddings#532
BradySkylight wants to merge 13 commits into
mainfrom
brady/452-get-loinc-delta

Conversation

@BradySkylight
Copy link
Copy Markdown
Collaborator

@BradySkylight BradySkylight commented May 6, 2026

Description

This PR includes changes to make it possible to perform updates against, at least LOINC for now, various medical terminology value-sets utilized by TTC. Below is a list of the changes you will see:

  • Another refactor that moves the common functionality for All value-sets (general.py) as well as specific functionality for each of the different value-sets/ontologies into their own util python script.
  • The old valuset_sync script has been renamed to 'extraction' and leverages the utils functionality to perform a FULL extraction and saves the various files
  • The readme has been updated to indicate naming changes and location changes for various files/functionality
  • The embedding notebook was also changed to reflect the new order of all LOINC data pulled from LOINC extract so it will work correctly if ever used again.
    • NOTE: There are some formatting changes in this file for some reason and I'm not sure why - ignore these unless you notice an issue
  • Added a maintenance.py to handle maintaining the valueset extracts and embeddings for all valuesets - though only LOINC has been implemented so far for loinc lab names, as this is the only valueset loaded in our model at this point.

Related Issues

Closes #452

Additional Notes

I haven't added in tests, but have tested along the way and have some print statements included for now just to see the end result. The output will be passed to the next step - create the embeddings - which will then be stored as files somewhere and then uploaded into OpenSearch.

Checklist for Reviewers

Please review and complete the following checklist during the review process:

  • The code follows best practices and conventions.
  • The changes implement the desired functionality or fix the reported issue.
  • The tests cover the new changes and pass successfully.
  • Any potential edge cases or error scenarios have been considered.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.78%. Comparing base (9f49739) to head (9bae0c4).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #532      +/-   ##
==========================================
- Coverage   95.81%   95.78%   -0.04%     
==========================================
  Files          46       46              
  Lines        2344     2372      +28     
==========================================
+ Hits         2246     2272      +26     
- Misses         98      100       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@BradySkylight BradySkylight added this to the SNOINC milestone May 7, 2026
@BradySkylight BradySkylight self-assigned this May 7, 2026
@BradySkylight BradySkylight marked this pull request as ready for review May 7, 2026 18:16
Copy link
Copy Markdown
Collaborator

@robertandremitchell robertandremitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! my concerns are mostly about documentation and aligning code with the other packages modules.



# Value Set Directories
SNOINC_DIRECTORY = "./data/snoinc_extracts"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more general feedback on different points of these directories / folders, we should use an OS-agnostic initialization akin to what the validation works does here: https://github.com/CDCgov/dibbs-text-to-code/blob/main/packages/validation/src/validation/main.py#L7-L21

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to implement something like that. I do like/prefer the use of os vs. other packages though. I'll address this in the next PR, as it's in flight now for the embeddings.

UMLS_API_KEY = os.environ.get("UMLS_API_KEY")


def clean_text_string(value: str) -> str:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also more general, but we should have doc strings on these imo

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add those to all the functions for sure.

hl7_rows = []

if hl7_response.status_code != 200:
print(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do envision this being pulled into the index-lambda or another service, we should use the logging library over printing

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This can be tackled in the next PR(s) as that is when we will start to wrap it in a lambda. For now, it's nice to have a way to see progress when running locally. But I'll add TODOs for each 'print' to convert to a logging statement instead.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create Process to get delta update for valueset for LOINC

3 participants