Skip to content

Output Format #90

@bkatiemills

Description

@bkatiemills

So - the core topic from today's meeting centered on what data format(s) our simulations should be outputting. Key points:

Things We Agreed On

Some points were (more or less) unanimous, so we'll take them as final decisions to start working from:

  • There must be a completely lossless output format option that preserves Geant4 step-level information, especially for the sake of detector development.
  • There must be an output format that is either identical to GRSISpoon's output, or a superset thereof (ie TigFragments + Truth), especially targeted at preparing physics analyses.
  • Calibration and triggers should be downstream of the simulation package.

Things We Argued About

There is still some debate on what information we can and/or should put where:

  • How best to store step-level information?
  • If / how to modify TigFragment to accommodate hit level information.
  • How will we store truth information that is most useful for students & experimenters who only want physics information? Leading suggestion == truth branch in the output tree.
  • What needs to be in the truth data?

A First-Order Solution

Debate is still completely open on this topic, but here is one possible scheme we can start perturbing from until we get an optimal solution:

  • Simulations can output either or both of two filetypes:
    • Step nTuple: a flat Geant4 ntuple containing absolutely all step-level information generated in the simulation.
    • TigFragment Tree + Truth: the exact tree that would be output by GRSISpoon if this had been real data, plus a truth branch which contains the hit (but not step) level information.

I believe this scheme covers all users without carrying around extra dead weight; @carlu, the step ntuple contains all the information you need for detector development in a format that should be familiar; @evan012345, I believe that the TigFragment tree will ultimately be the most usable for students (and everyone else), since users will have to be able to write analyses that can operate on this structure if they ever want to analyze real data, since this is what comes out of our sort code. Also keep in mind that a standard postprocessor that chews up TigFragments and spits out a simpler flat ntuple of meaningful physics parameters is a very viable product that we can produce to help users interpret the output of both GRSISpoon and detectorSimulations.

Other auxiliary points that we discussed to include in this first-order approximation are:

  • Truth branch in the tree should be an empty vector or null pointer when unused, and not just a bunch of zeroes in order to mitigate bloat on disk.
  • Truth branch should include at a minimum:
    • Hit position
    • PID
    • Parent process
    • Primary event
  • The variables included in the truth branch should be toggle-able from the .mac; for the time being we will stick to a small set of truth variables to produce a minimum viable product.

Forward

This is a high priority issue that needs to be resolved before we can move too far further. I expect the collaboration to be able to reach consensus by the end of April at the very latest, whereupon we will accept the best plan and move forward with it. Please propose and debate all changes in the comments, so we can keep track of everyone's input.

cc @AdamGarnsworthy @pcbend @damiller @christinaburbadge @moukaddam @evitts @r3dunlop

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions