168 mid level diagrams for pipelines by arielleleon · Pull Request #169 · AllenNeuralDynamics/aind-software-docs

arielleleon · 2026-06-05T00:02:52Z

added diagrams to demonstrate pipeline modularity following the diagrams in this PR: Add diagram of modular experiments #162

What this should show:

Pipeline modularity based off modular data inputs
Processing modularity - pipelines can at least three capsules for packaging, QC and metadata generation (for the derived asset and the NWB file)
Pipelines each output an NWB file that can be combined for release later

bruno-f-cruz · 2026-06-05T00:31:45Z

Much better, but I would also highlight the modular organization of the "raw data", and ideally, the acquisition too.

arielleleon · 2026-06-05T00:59:11Z

Much better, but I would also highlight the modular organization of the "raw data", and ideally, the acquisition too.

Is this what you meant?

bruno-f-cruz

What is aind schematized metadata? why not call it aind-data-schema?
Metadata is not split by modality in s3/docdb. At that point, there is already a "merged" metadata.
I thought the thing you felt was missing from the previous diagram was a specific callout to which files are generated where. This diagram has the same issue, no? It should have a Rig schematic somewhere that says acquisition.json/instrument.json (these can be split by modality). The pipelines should say processing.json/qc.json, etc....
You should not call out "plots" specifically; these should always be called "artifacts". What if people want to save tables, sounds, videos, etc...?
It is not clear to me what processing.json -> Aggregate processing -> bracket is doing. Can you describe it to see if there is a better way to go about this?
Overall, I think the pink box is a bit too chaotic and could use more structure.

I think it would help if you added a few bullet points of what you think this diagram should be describing / what are the major features of the architecture are that you want to highlight. I don't think the current diagram is super faithful to what I feel the current architecture is.

arielleleon · 2026-06-05T23:54:03Z

What is aind schematized metadata? why not call it aind-data-schema?

Metadata is not split by modality in s3/docdb. At that point, there is already a "merged" metadata.

I thought the thing you felt was missing from the previous diagram was a specific callout to which files are generated where. This diagram has the same issue, no? It should have a Rig schematic somewhere that says acquisition.json/instrument.json (these can be split by modality). The pipelines should say processing.json/qc.json, etc....

You should not call out "plots" specifically; these should always be called "artifacts". What if people want to save tables, sounds, videos, etc...?

It is not clear to me what processing.json -> Aggregate processing -> bracket is doing. Can you describe it to see if there is a better way to go about this?

Overall, I think the pink box is a bit too chaotic and could use more structure.

I think it would help if you added a few bullet points of what you think this diagram should be describing / what are the major features of the architecture are that you want to highlight. I don't think the current diagram is super faithful to what I feel the current architecture is.

Addressed in commit 53f9efa
that is what the large file around all the data called "Raw Data" is supposed to represent
I did address this by calling out which files the pipeline is generating, not the rig.
Fixed in commit 8971cfe
Let me think about how to make the processing.json aggregation more clear
Yeah - let me see how I can make it more compact

I updated the PR description with my goals. To reiterate, the goal is to create a mid-level diagram to illustrate how pipeline processing is done

arielleleon · 2026-06-06T04:13:11Z

@bruno-f-cruz - I tried to do a better job on points 5 and 6

bruno-f-cruz · 2026-06-06T05:49:14Z

@bruno-f-cruz - I tried to do a better job on points 5 and 6

Almost there. I think it is still misleading to have metadata per modality. That is something that SciComp has made quite a big deal of in the past: When data lands in the public bucket, metadata must be merged.

arielleleon · 2026-06-10T20:43:36Z

tadata per modality. That is something that SciComp has made quite a big deal of in the past: When data lands in the public bucke

but there will be metadata per modality. I am trying to represent the general metadata in each modality that gets used to make the final schema.

bruno-f-cruz · 2026-06-10T20:45:42Z

tadata per modality. That is something that SciComp has made quite a big deal of in the past: When data lands in the public bucke

but there will be metadata per modality. I am trying to represent the general metadata in each modality that gets used to make the final schema.

I don't follow. Formally, metadata merging is a strictly destructive operation. After merging, how do you know which metadata corresponds to which modality? For instance, how do you know to which modality a specific piece of hardware belongs?

arielleleon · 2026-06-10T20:48:11Z

tadata per modality. That is something that SciComp has made quite a big deal of in the past: When data lands in the public bucke

but there will be metadata per modality. I am trying to represent the general metadata in each modality that gets used to make the final schema.

I don't follow. Formally, metadata merging is a strictly destructive operation. After merging, how do you know which metadata corresponds to which modality? For instance, how do you know to which modality a specific piece of hardware belongs?

Don't you merge your acquisition files?

arielleleon · 2026-06-10T20:51:56Z

@bruno-f-cruz removed the modality metadata

bruno-f-cruz · 2026-06-10T21:02:20Z

@bruno-f-cruz removed the modality metadata

This one needs to be removed too.

Also, the dashed line represents a zoomed-in version of the blue boxes if I am understanding correctly. You may want to call that out with some sort of visual indication. (using color, an arrow, name "Pipeline Modality X", etc...)

It is also worth making it clear that the pipelines should ideally run modality specific and not coupled to ALL modalities like it is shown in the diagram.

arielleleon · 2026-06-10T21:17:51Z

ines should ideally run modality specific and not coupled to ALL modalities like it is shown in the diagram.
@bruno-f-cruz
This represents the asset as it is in S3. Each asset (with all modalities) gets attached to a pipeline. The pipeline just processes the modalities that it cares about.

arielleleon · 2026-06-11T01:35:58Z

@bruno-f-cruz - can you take one last look and approve?

arielleleon added 3 commits June 4, 2026 16:57

feat: add new pipeline processing diagrams

f9c18ac

fix: add titles and adjust layout

f74ecf6

chore; adjust layout

46ca3f1

arielleleon linked an issue Jun 5, 2026 that may be closed by this pull request

mid-level diagrams for pipelines #168

Open

arielleleon requested a review from bruno-f-cruz June 5, 2026 00:02

chore: remove extra arrow

a17439e

chore: demonstrate modality separation better

f108e9c

arielleleon added 2 commits June 4, 2026 18:00

chore: centering

316348c

chore: change color schema and recentering

cbc4611

bruno-f-cruz requested changes Jun 5, 2026

View reviewed changes

arielleleon added 2 commits June 5, 2026 16:46

chore: rename metadata to reference aind-data-schema

53f9efa

chore: rename Plots to Artifacts

8971cfe

chore: refactoring the design of the pipeline innards

462c21d

chore: rebalance diagram

8730c0c

arielleleon requested a review from jtyoung84 June 10, 2026 20:43

chore: remove individual metadata from modality representation

6c4b9e5

Conversation

arielleleon commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bruno-f-cruz commented Jun 5, 2026

Uh oh!

arielleleon commented Jun 5, 2026

Uh oh!

bruno-f-cruz left a comment

Choose a reason for hiding this comment

Uh oh!

arielleleon commented Jun 5, 2026

Uh oh!

arielleleon commented Jun 6, 2026

Uh oh!

bruno-f-cruz commented Jun 6, 2026

Uh oh!

arielleleon commented Jun 10, 2026

Uh oh!

bruno-f-cruz commented Jun 10, 2026

Uh oh!

arielleleon commented Jun 10, 2026

Uh oh!

arielleleon commented Jun 10, 2026

Uh oh!

bruno-f-cruz commented Jun 10, 2026

Uh oh!

arielleleon commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arielleleon commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arielleleon commented Jun 5, 2026 •

edited

Loading

arielleleon commented Jun 10, 2026 •

edited

Loading