Skip to content

Add per-asset dataStandard, HED standard, and extensions to StandardsType#371

Open
yarikoptic wants to merge 2 commits intomasterfrom
enh-hed
Open

Add per-asset dataStandard, HED standard, and extensions to StandardsType#371
yarikoptic wants to merge 2 commits intomasterfrom
enh-hed

Conversation

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Feb 24, 2026

Summary

  • Add dataStandard: Optional[List[StandardsType]] field to BareAsset so dandi-cli can declare per-asset data standards (NWB, BIDS, HED, OME/NGFF) that flow through to AssetsSummary during aggregation
  • Add version: Optional[str] field to StandardsType to track standard versions (e.g. NWB 2.7.0, BIDS 1.9.0, HED 8.2.0)
  • Add extensions: Optional[List[StandardsType]] self-referencing field to StandardsType for standard extensions (NWB ndx-* extensions, HED library schemas)
  • Add hed_standard constant alongside existing nwb_standard, bids_standard, ome_ngff_standard
  • Collect per-asset dataStandard entries in aggregate_assets_summary(), with deprecated heuristic fallbacks for older clients (remove after 2026-12-01)
  • Register dataStandard in migrate() _FIELDS_INTRODUCED for downgrade pruning
  • Comprehensive CLAUDE.md with architecture docs and schema change checklist

Companion dandi-cli PR: dandi/dandi-cli#XXXX

Test plan

  • test_hed_standard_structure — verifies hed_standard has correct name, identifier, schemaKey
  • test_aggregate_per_asset_datastandard — HED declared per-asset flows to summary
  • test_aggregate_per_asset_datastandard_no_duplication — BIDS not duplicated when declared both per-asset and via heuristic
  • All 243 existing tests pass

PR in dandi-cli:

🤖 Generated with Claude Code

@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.93%. Comparing base (e3ea45b) to head (571978e).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #371      +/-   ##
==========================================
+ Coverage   97.91%   97.93%   +0.02%     
==========================================
  Files          18       18              
  Lines        2401     2427      +26     
==========================================
+ Hits         2351     2377      +26     
  Misses         50       50              
Flag Coverage Δ
unittests 97.93% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yarikoptic and others added 2 commits March 3, 2026 11:33
…s on StandardsType

- Bump DANDI_SCHEMA_VERSION to 0.8.0, add 0.7.0 to ALLOWED_INPUT_SCHEMAS
- Add `dataStandard: Optional[List[StandardsType]]` to BareAsset for
  per-asset standard declarations (NWB, BIDS, HED, OME/NGFF)
- Add `version: Optional[str]` to StandardsType
- Add `extensions: Optional[List["StandardsType"]]` (self-referencing)
  to StandardsType for NWB ndx-* extensions, HED library schemas, etc.
- Add `hed_standard` constant (RRID:SCR_014074)
- Collect per-asset dataStandard in aggregate_assets_summary(), with
  deprecated path/encoding heuristic fallbacks (remove after 2026-12-01)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cklist

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dataStandard: Optional[List[StandardsType]] = Field(
None,
description="Data standard(s) applicable to this asset.",
json_schema_extra={"readOnly": True},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readOnly should set to true only in the case that the dataStandard field is assigned and managed solely by the server per https://json-schema.org/draft/2020-12/draft-bhutton-json-schema-validation-00#rfc.section.9.4. Is it the case that the dataStandard field is intended to be assigned and managed solely by the server?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess the answer is no -- that it is intended to be extracted from the source files by the DANDI CLI running client-side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INTERESTING! We also have readOnly on above "TODO" attributes which I guess nobody ever assigned to any asset. Although we are indeed to populate it from client (or on server) but since we have other similar wasGeneratedBy also editable, I see no point in making this readOnly, and I guess it is really our own thing, hence

Suggested change
json_schema_extra={"readOnly": True},
json_schema_extra={"nskey": DANDI_NSKEY},

right?

@bendichter
Copy link
Member

@yarikoptic this looks great! It will be exciting to be able to query the archive for NWB version and extension usage for data going forward. I agree with @candleindark that this probably should not be readOnly. Other than that, looks good. It would be great to also see a draft of dandi-cli that extracts this information. I'd prefer to have a draft of that before merging this, in case we are able to catch any edge cases.

@yarikoptic
Copy link
Member Author

It would be great to also see a draft of dandi-cli that extracts this information

It is there waiting for you

Copy link
Member

@candleindark candleindark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor suggestions for change

dataStandard: Optional[List[StandardsType]] = Field(
None,
description="Data standard(s) applicable to this asset.",
json_schema_extra={"readOnly": True},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have an "nskey" key in the json_schema_extra dict? If dataStandard is not defined in a known ontology, we I think we should use ours, i.e.,

json_schema_extra={"nskey": DANDI_NSKEY}

)
extensions: Optional[List["StandardsType"]] = Field(
None,
description="Extensions to the standard used in this dataset "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description="Extensions to the standard used in this dataset "
description="Extensions to the standard used"

The standard extended can be just one applicable to an asset, not an entire dataset per definition of dataStandard in BareAsset.

description="Version of the standard used.",
json_schema_extra={"nskey": "schema"},
)
extensions: Optional[List["StandardsType"]] = Field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
extensions: Optional[List["StandardsType"]] = Field(
extensions: Optional[List[StandardsType]] = Field(

With from __future__ import annotations at the beginning of the file, you don't need the type to be wrapped in a string.

extensions: Optional[List["StandardsType"]] = Field(
None,
description="Extensions to the standard used in this dataset "
"(e.g. NWB extensions like ndx-*, HED library schemas).",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we set "nskey" key in the json_schema_extra dict? If extensions is not defined in a known ontology, we I think we should use ours, i.e.,

json_schema_extra={"nskey": DANDI_NSKEY}

identifier="DOI:10.25504/FAIRsharing.9af712",
).model_dump(mode="json", exclude_none=True)

hed_standard = StandardsType(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining hed_standard in lower case is consistent with the established pattern, as in ome_ngff_standard. However, all the standard variables, nwb_standard, bids_standard, ome_ngff_standard, and hed_standard should be specified in upper case laters since they function as constants.

I think that should be done in a separate PR though.

@bendichter
Copy link
Member

I approve the dandi dli PR. Once we respond to @candleindark 's comments, this is good to go from my end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants