Skip to content

feat: complete NWB export handler (conversion, validation, status flow)#91

Draft
tabedzki wants to merge 1 commit into
fix/nwb-export-handler-schemafrom
feat/nwb-export-handler-completion
Draft

feat: complete NWB export handler (conversion, validation, status flow)#91
tabedzki wants to merge 1 commit into
fix/nwb-export-handler-schemafrom
feat/nwb-export-handler-completion

Conversation

@tabedzki

Copy link
Copy Markdown
Contributor

Part of BrainCOGS/tech_calendar_streamlit#70. Stacked on #90 (base = fix/nwb-export-handler-schema).

PR #90 reconciled the handler to the schema but left the pipeline non-functional (conversion and validation were NotImplementedError stubs). This PR makes the cronjob handler actually run end-to-end.

What changed

  1. Recording/scan key derivation (nwb_export_handler.py + nwb_production_utils.py). The job record carries only the acquisition.Session key, not recording_id/scan keys, so the old {k: job[k] for k in recording.Recording.primary_key} collapsed to {} and matched every recording (vacuous pass). New recording_ids_for_session() helper resolves recording_id(s) via recording.Recording.BehaviorSession. Applied to both the ephys and imaging branches; imaging fails loud (linkage not yet confirmed) rather than passing vacuously.
  2. process_nwb_conversion now performs a real conversion instead of NotImplementedError. Conversion logic was extracted into a shared module u19_pipeline/nwb_export/conversion.py (build_source_data / query_metadata / run_conversion_to_file, ported from run_nwb_export.py) so the CLI and the cronjob share one code path. Input paths resolve via resolve_input_paths() (export_params or NWB_EXPORT_DATA_ROOT), failing with an actionable error when unresolved.
  3. process_validation no longer inserts a record then raises NotImplementedError (which duplicate-key-crashed on every retry). It runs a real HDF5 integrity check + optional NWB Inspector, records truthful pass/fail and counts, and inserts the row idempotently (replace on retry).
  4. Status flow reconciled to NwbExportStatusEnum (QUEUED→DATA_VALIDATION→PROCESSING→VALIDATION→[UPLOAD]→COMPLETED). The old handler treated status 3 as completion, conflicting with the enum (VALIDATION=3, COMPLETED=6) and run_nwb_export.py. Added process_upload for the optional DANDI stage: no-DANDI jobs go straight to COMPLETED; requested uploads use DandiUploadClient and fail loud if credentials are missing (never silently skipped).

Tests

python -m pytest tests/nwb_export -q -m "not with_db"193 passed, 19 deselected (the deselected ones require a live DataJoint DB). ruff check + format clean on the changed files.

Known follow-ups (intentionally not implemented — flagged rather than faked)

  • Imaging Scanacquisition.Session linkage is not reliably known, so the imaging branch fails loud with a TODO instead of guessing a key. Needs confirmation of the imaging schema linkage before imaging export can be validated/converted.
  • probe_numbers semantic mismatch: the Streamlit form stores selected processing job_ids in NwbExportModality.probe_numbers, but validate_ephys_data_exists treats those values as insertion_numbers. One side needs to change so they agree.
  • Data-path auto-discovery: resolve_input_paths currently uses export_params or a simple NWB_EXPORT_DATA_ROOT convention; a DB-driven resolver (raw behavior/Kilosort paths per session) would remove the manual step.
  • DANDI asset id: neuroconv's upload returns organised file paths, not a DANDI asset id, so dandi_asset_id is left NULL on success (a real asset-id lookup is a TODO).

🤖 Generated with Claude Code

… flow

Completes the cronjob export handler that PR #90 only reconciled to the schema.

1. Recording/scan key derivation (handler + nwb_production_utils): the job record
   carries only the acquisition.Session key, not recording_id/scan keys, so the
   old `{k: job[k] for k in recording.Recording.primary_key}` collapsed to {} and
   matched every recording (vacuous pass). Now resolves recording_id(s) via the
   new recording_ids_for_session() helper (recording.Recording.BehaviorSession).
   Applied to both the ephys and imaging branches; imaging fails loud (Scan<->
   session linkage not yet confirmed) instead of silently passing.

2. process_nwb_conversion: no longer raises NotImplementedError. Conversion logic
   is extracted into the shared module u19_pipeline/nwb_export/conversion.py
   (build_source_data / query_metadata / run_conversion_to_file, ported from
   run_nwb_export.py), so the CLI and cronjob share one code path. Input paths are
   resolved via resolve_input_paths() (export_params or NWB_EXPORT_DATA_ROOT) with
   an actionable error when unresolved.

3. process_validation: removed the broken stub that inserted a validation record
   then raised NotImplementedError (which duplicate-key-crashed on retry). Now runs
   a real HDF5 integrity check + optional NWB Inspector, records truthful results,
   and inserts the row idempotently (replace on retry).

4. Status flow reconciled to NwbExportStatusEnum (QUEUED->DATA_VALIDATION->
   PROCESSING->VALIDATION->[UPLOAD]->COMPLETED). The old handler treated status 3
   as completion, conflicting with the enum (VALIDATION=3, COMPLETED=6) and
   run_nwb_export.py. Added process_upload for the optional DANDI stage (no-DANDI
   jobs go straight to COMPLETED; requested uploads use DandiUploadClient and fail
   loud if credentials are missing).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant