Skip to content

168 mid level diagrams for pipelines#169

Open
arielleleon wants to merge 12 commits into
mainfrom
168-mid-level-diagrams-for-pipelines
Open

168 mid level diagrams for pipelines#169
arielleleon wants to merge 12 commits into
mainfrom
168-mid-level-diagrams-for-pipelines

Conversation

@arielleleon

@arielleleon arielleleon commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

What this should show:

  1. Pipeline modularity based off modular data inputs
  2. Processing modularity - pipelines can at least three capsules for packaging, QC and metadata generation (for the derived asset and the NWB file)
  3. Pipelines each output an NWB file that can be combined for release later

@arielleleon arielleleon linked an issue Jun 5, 2026 that may be closed by this pull request
@arielleleon arielleleon requested a review from bruno-f-cruz June 5, 2026 00:02
@bruno-f-cruz

Copy link
Copy Markdown
Member

Much better, but I would also highlight the modular organization of the "raw data", and ideally, the acquisition too.

@arielleleon

Copy link
Copy Markdown
Collaborator Author

Much better, but I would also highlight the modular organization of the "raw data", and ideally, the acquisition too.

Is this what you meant?

@bruno-f-cruz bruno-f-cruz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What is aind schematized metadata? why not call it aind-data-schema?
  • Metadata is not split by modality in s3/docdb. At that point, there is already a "merged" metadata.
  • I thought the thing you felt was missing from the previous diagram was a specific callout to which files are generated where. This diagram has the same issue, no? It should have a Rig schematic somewhere that says acquisition.json/instrument.json (these can be split by modality). The pipelines should say processing.json/qc.json, etc....
  • You should not call out "plots" specifically; these should always be called "artifacts". What if people want to save tables, sounds, videos, etc...?
  • It is not clear to me what processing.json -> Aggregate processing -> bracket is doing. Can you describe it to see if there is a better way to go about this?
  • Overall, I think the pink box is a bit too chaotic and could use more structure.

I think it would help if you added a few bullet points of what you think this diagram should be describing / what are the major features of the architecture are that you want to highlight. I don't think the current diagram is super faithful to what I feel the current architecture is.

@arielleleon

Copy link
Copy Markdown
Collaborator Author
  • What is aind schematized metadata? why not call it aind-data-schema?
  • Metadata is not split by modality in s3/docdb. At that point, there is already a "merged" metadata.
  • I thought the thing you felt was missing from the previous diagram was a specific callout to which files are generated where. This diagram has the same issue, no? It should have a Rig schematic somewhere that says acquisition.json/instrument.json (these can be split by modality). The pipelines should say processing.json/qc.json, etc....
  • You should not call out "plots" specifically; these should always be called "artifacts". What if people want to save tables, sounds, videos, etc...?
  • It is not clear to me what processing.json -> Aggregate processing -> bracket is doing. Can you describe it to see if there is a better way to go about this?
  • Overall, I think the pink box is a bit too chaotic and could use more structure.

I think it would help if you added a few bullet points of what you think this diagram should be describing / what are the major features of the architecture are that you want to highlight. I don't think the current diagram is super faithful to what I feel the current architecture is.

  1. Addressed in commit 53f9efa
  2. that is what the large file around all the data called "Raw Data" is supposed to represent
  3. I did address this by calling out which files the pipeline is generating, not the rig.
  4. Fixed in commit 8971cfe
  5. Let me think about how to make the processing.json aggregation more clear
  6. Yeah - let me see how I can make it more compact

I updated the PR description with my goals. To reiterate, the goal is to create a mid-level diagram to illustrate how pipeline processing is done

@arielleleon

Copy link
Copy Markdown
Collaborator Author

@bruno-f-cruz - I tried to do a better job on points 5 and 6

@bruno-f-cruz

Copy link
Copy Markdown
Member

@bruno-f-cruz - I tried to do a better job on points 5 and 6

Almost there. I think it is still misleading to have metadata per modality. That is something that SciComp has made quite a big deal of in the past: When data lands in the public bucket, metadata must be merged.

@arielleleon

Copy link
Copy Markdown
Collaborator Author

tadata per modality. That is something that SciComp has made quite a big deal of in the past: When data lands in the public bucke

but there will be metadata per modality. I am trying to represent the general metadata in each modality that gets used to make the final schema.

@arielleleon arielleleon requested a review from jtyoung84 June 10, 2026 20:43
@bruno-f-cruz

Copy link
Copy Markdown
Member

tadata per modality. That is something that SciComp has made quite a big deal of in the past: When data lands in the public bucke

but there will be metadata per modality. I am trying to represent the general metadata in each modality that gets used to make the final schema.

I don't follow. Formally, metadata merging is a strictly destructive operation. After merging, how do you know which metadata corresponds to which modality? For instance, how do you know to which modality a specific piece of hardware belongs?

@arielleleon

Copy link
Copy Markdown
Collaborator Author

tadata per modality. That is something that SciComp has made quite a big deal of in the past: When data lands in the public bucke

but there will be metadata per modality. I am trying to represent the general metadata in each modality that gets used to make the final schema.

I don't follow. Formally, metadata merging is a strictly destructive operation. After merging, how do you know which metadata corresponds to which modality? For instance, how do you know to which modality a specific piece of hardware belongs?

Don't you merge your acquisition files?

@arielleleon

Copy link
Copy Markdown
Collaborator Author

@bruno-f-cruz removed the modality metadata

@bruno-f-cruz

Copy link
Copy Markdown
Member

@bruno-f-cruz removed the modality metadata

image

This one needs to be removed too.

Also, the dashed line represents a zoomed-in version of the blue boxes if I am understanding correctly. You may want to call that out with some sort of visual indication. (using color, an arrow, name "Pipeline Modality X", etc...)

It is also worth making it clear that the pipelines should ideally run modality specific and not coupled to ALL modalities like it is shown in the diagram.

@arielleleon

arielleleon commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

ines should ideally run modality specific and not coupled to ALL modalities like it is shown in the diagram.
@bruno-f-cruz
This represents the asset as it is in S3. Each asset (with all modalities) gets attached to a pipeline. The pipeline just processes the modalities that it cares about.

@arielleleon

Copy link
Copy Markdown
Collaborator Author

@bruno-f-cruz - can you take one last look and approve?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mid-level diagrams for pipelines

2 participants