FE-400 - Lungmap modifications#26
Open
JoshuaFortriede wants to merge 2 commits into
Open
Conversation
Check if the drs_uri key is in the file_descriptor of the sequence file. If yes, it is assumed that the FASTQ file will be located externally and will get (eventually) a DRS URI. As such, do not mark the data file as missing.
Changed to just get last split.
ncalvanese1
reviewed
May 9, 2025
| # Sequence file data_files might not be present if they are managed access. | ||
| # File Descriptor v2.1.0 allows for the drs_uri to be a string or null. | ||
| # In both of these cases, we set found_data_file to True | ||
| if metadata_file["entity_type"] == "sequence_file": |
Collaborator
There was a problem hiding this comment.
I don't think this is necessarily limited to sequencing files in the spec, though that is the use case for LungMap.
Author
There was a problem hiding this comment.
That is correct. This could be change to
if metadata_file["entity_type"].endswith("_file"):
Contributor
|
@JoshuaFortriede this looks great - thank you! |
Author
|
No, not really. The bucket that we have is a "sharing" bucket and not a "staging" bucket, so it doesn't have the staging_area.json files... @ncalvanese1 might be able to share with you an appropriate staging bucket. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR completes two tasks.
Lookup of DRS_URI in file_descriptors:
According to v2.1.0 of the HCA schema file_descriptors, a DRS URI can be provided for a file. This means that the actual data file will not be present in the payload, but will be linked externally. As such, we should not flag these data files as being missing in the submission.
Note, for future, it would be good to check if the there is BOTH a DRS URI and the data_file. If so, you might throw an error.
Completion of TODO Item:
There was a note to remove an unused variable. This has been accomplished.