Add per-asset dataStandard, HED standard, and extensions to StandardsType#371
Add per-asset dataStandard, HED standard, and extensions to StandardsType#371yarikoptic wants to merge 2 commits intomasterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #371 +/- ##
==========================================
+ Coverage 97.91% 97.93% +0.02%
==========================================
Files 18 18
Lines 2401 2427 +26
==========================================
+ Hits 2351 2377 +26
Misses 50 50
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…s on StandardsType - Bump DANDI_SCHEMA_VERSION to 0.8.0, add 0.7.0 to ALLOWED_INPUT_SCHEMAS - Add `dataStandard: Optional[List[StandardsType]]` to BareAsset for per-asset standard declarations (NWB, BIDS, HED, OME/NGFF) - Add `version: Optional[str]` to StandardsType - Add `extensions: Optional[List["StandardsType"]]` (self-referencing) to StandardsType for NWB ndx-* extensions, HED library schemas, etc. - Add `hed_standard` constant (RRID:SCR_014074) - Collect per-asset dataStandard in aggregate_assets_summary(), with deprecated path/encoding heuristic fallbacks (remove after 2026-12-01) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cklist Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| dataStandard: Optional[List[StandardsType]] = Field( | ||
| None, | ||
| description="Data standard(s) applicable to this asset.", | ||
| json_schema_extra={"readOnly": True}, |
There was a problem hiding this comment.
readOnly should set to true only in the case that the dataStandard field is assigned and managed solely by the server per https://json-schema.org/draft/2020-12/draft-bhutton-json-schema-validation-00#rfc.section.9.4. Is it the case that the dataStandard field is intended to be assigned and managed solely by the server?
There was a problem hiding this comment.
I would guess the answer is no -- that it is intended to be extracted from the source files by the DANDI CLI running client-side.
There was a problem hiding this comment.
INTERESTING! We also have readOnly on above "TODO" attributes which I guess nobody ever assigned to any asset. Although we are indeed to populate it from client (or on server) but since we have other similar wasGeneratedBy also editable, I see no point in making this readOnly, and I guess it is really our own thing, hence
| json_schema_extra={"readOnly": True}, | |
| json_schema_extra={"nskey": DANDI_NSKEY}, |
right?
|
@yarikoptic this looks great! It will be exciting to be able to query the archive for NWB version and extension usage for data going forward. I agree with @candleindark that this probably should not be readOnly. Other than that, looks good. It would be great to also see a draft of dandi-cli that extracts this information. I'd prefer to have a draft of that before merging this, in case we are able to catch any edge cases. |
It is there waiting for you |
candleindark
left a comment
There was a problem hiding this comment.
Left some minor suggestions for change
| dataStandard: Optional[List[StandardsType]] = Field( | ||
| None, | ||
| description="Data standard(s) applicable to this asset.", | ||
| json_schema_extra={"readOnly": True}, |
There was a problem hiding this comment.
Shouldn't we have an "nskey" key in the json_schema_extra dict? If dataStandard is not defined in a known ontology, we I think we should use ours, i.e.,
json_schema_extra={"nskey": DANDI_NSKEY}
| ) | ||
| extensions: Optional[List["StandardsType"]] = Field( | ||
| None, | ||
| description="Extensions to the standard used in this dataset " |
There was a problem hiding this comment.
| description="Extensions to the standard used in this dataset " | |
| description="Extensions to the standard used" |
The standard extended can be just one applicable to an asset, not an entire dataset per definition of dataStandard in BareAsset.
| description="Version of the standard used.", | ||
| json_schema_extra={"nskey": "schema"}, | ||
| ) | ||
| extensions: Optional[List["StandardsType"]] = Field( |
There was a problem hiding this comment.
| extensions: Optional[List["StandardsType"]] = Field( | |
| extensions: Optional[List[StandardsType]] = Field( |
With from __future__ import annotations at the beginning of the file, you don't need the type to be wrapped in a string.
| extensions: Optional[List["StandardsType"]] = Field( | ||
| None, | ||
| description="Extensions to the standard used in this dataset " | ||
| "(e.g. NWB extensions like ndx-*, HED library schemas).", |
There was a problem hiding this comment.
Shouldn't we set "nskey" key in the json_schema_extra dict? If extensions is not defined in a known ontology, we I think we should use ours, i.e.,
json_schema_extra={"nskey": DANDI_NSKEY}
| identifier="DOI:10.25504/FAIRsharing.9af712", | ||
| ).model_dump(mode="json", exclude_none=True) | ||
|
|
||
| hed_standard = StandardsType( |
There was a problem hiding this comment.
Defining hed_standard in lower case is consistent with the established pattern, as in ome_ngff_standard. However, all the standard variables, nwb_standard, bids_standard, ome_ngff_standard, and hed_standard should be specified in upper case laters since they function as constants.
I think that should be done in a separate PR though.
|
I approve the dandi dli PR. Once we respond to @candleindark 's comments, this is good to go from my end |
Summary
dataStandard: Optional[List[StandardsType]]field toBareAssetso dandi-cli can declare per-asset data standards (NWB, BIDS, HED, OME/NGFF) that flow through toAssetsSummaryduring aggregationversion: Optional[str]field toStandardsTypeto track standard versions (e.g. NWB 2.7.0, BIDS 1.9.0, HED 8.2.0)extensions: Optional[List[StandardsType]]self-referencing field toStandardsTypefor standard extensions (NWB ndx-* extensions, HED library schemas)hed_standardconstant alongside existingnwb_standard,bids_standard,ome_ngff_standarddataStandardentries inaggregate_assets_summary(), with deprecated heuristic fallbacks for older clients (remove after 2026-12-01)dataStandardinmigrate()_FIELDS_INTRODUCEDfor downgrade pruningCLAUDE.mdwith architecture docs and schema change checklistCompanion dandi-cli PR: dandi/dandi-cli#XXXX
Test plan
test_hed_standard_structure— verifies hed_standard has correct name, identifier, schemaKeytest_aggregate_per_asset_datastandard— HED declared per-asset flows to summarytest_aggregate_per_asset_datastandard_no_duplication— BIDS not duplicated when declared both per-asset and via heuristicPR in dandi-cli:
🤖 Generated with Claude Code