feat: complete NWB export handler (conversion, validation, status flow)#91
Draft
tabedzki wants to merge 1 commit into
Draft
feat: complete NWB export handler (conversion, validation, status flow)#91tabedzki wants to merge 1 commit into
tabedzki wants to merge 1 commit into
Conversation
… flow Completes the cronjob export handler that PR #90 only reconciled to the schema. 1. Recording/scan key derivation (handler + nwb_production_utils): the job record carries only the acquisition.Session key, not recording_id/scan keys, so the old `{k: job[k] for k in recording.Recording.primary_key}` collapsed to {} and matched every recording (vacuous pass). Now resolves recording_id(s) via the new recording_ids_for_session() helper (recording.Recording.BehaviorSession). Applied to both the ephys and imaging branches; imaging fails loud (Scan<-> session linkage not yet confirmed) instead of silently passing. 2. process_nwb_conversion: no longer raises NotImplementedError. Conversion logic is extracted into the shared module u19_pipeline/nwb_export/conversion.py (build_source_data / query_metadata / run_conversion_to_file, ported from run_nwb_export.py), so the CLI and cronjob share one code path. Input paths are resolved via resolve_input_paths() (export_params or NWB_EXPORT_DATA_ROOT) with an actionable error when unresolved. 3. process_validation: removed the broken stub that inserted a validation record then raised NotImplementedError (which duplicate-key-crashed on retry). Now runs a real HDF5 integrity check + optional NWB Inspector, records truthful results, and inserts the row idempotently (replace on retry). 4. Status flow reconciled to NwbExportStatusEnum (QUEUED->DATA_VALIDATION-> PROCESSING->VALIDATION->[UPLOAD]->COMPLETED). The old handler treated status 3 as completion, conflicting with the enum (VALIDATION=3, COMPLETED=6) and run_nwb_export.py. Added process_upload for the optional DANDI stage (no-DANDI jobs go straight to COMPLETED; requested uploads use DandiUploadClient and fail loud if credentials are missing). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of BrainCOGS/tech_calendar_streamlit#70. Stacked on #90 (base =
fix/nwb-export-handler-schema).PR #90 reconciled the handler to the schema but left the pipeline non-functional (conversion and validation were
NotImplementedErrorstubs). This PR makes the cronjob handler actually run end-to-end.What changed
nwb_export_handler.py+nwb_production_utils.py). The job record carries only theacquisition.Sessionkey, notrecording_id/scan keys, so the old{k: job[k] for k in recording.Recording.primary_key}collapsed to{}and matched every recording (vacuous pass). Newrecording_ids_for_session()helper resolves recording_id(s) viarecording.Recording.BehaviorSession. Applied to both the ephys and imaging branches; imaging fails loud (linkage not yet confirmed) rather than passing vacuously.process_nwb_conversionnow performs a real conversion instead ofNotImplementedError. Conversion logic was extracted into a shared moduleu19_pipeline/nwb_export/conversion.py(build_source_data/query_metadata/run_conversion_to_file, ported fromrun_nwb_export.py) so the CLI and the cronjob share one code path. Input paths resolve viaresolve_input_paths()(export_paramsorNWB_EXPORT_DATA_ROOT), failing with an actionable error when unresolved.process_validationno longer inserts a record then raisesNotImplementedError(which duplicate-key-crashed on every retry). It runs a real HDF5 integrity check + optional NWB Inspector, records truthful pass/fail and counts, and inserts the row idempotently (replace on retry).NwbExportStatusEnum(QUEUED→DATA_VALIDATION→PROCESSING→VALIDATION→[UPLOAD]→COMPLETED). The old handler treated status 3 as completion, conflicting with the enum (VALIDATION=3,COMPLETED=6) andrun_nwb_export.py. Addedprocess_uploadfor the optional DANDI stage: no-DANDI jobs go straight to COMPLETED; requested uploads useDandiUploadClientand fail loud if credentials are missing (never silently skipped).Tests
python -m pytest tests/nwb_export -q -m "not with_db"→ 193 passed, 19 deselected (the deselected ones require a live DataJoint DB). ruff check + format clean on the changed files.Known follow-ups (intentionally not implemented — flagged rather than faked)
Scan↔acquisition.Sessionlinkage is not reliably known, so the imaging branch fails loud with a TODO instead of guessing a key. Needs confirmation of the imaging schema linkage before imaging export can be validated/converted.probe_numberssemantic mismatch: the Streamlit form stores selected processingjob_ids inNwbExportModality.probe_numbers, butvalidate_ephys_data_existstreats those values asinsertion_numbers. One side needs to change so they agree.resolve_input_pathscurrently usesexport_paramsor a simpleNWB_EXPORT_DATA_ROOTconvention; a DB-driven resolver (raw behavior/Kilosort paths per session) would remove the manual step.dandi_asset_idis left NULL on success (a real asset-id lookup is a TODO).🤖 Generated with Claude Code