Skip to content

First implementation of exporting input variables and infer the ParT network with onnxruntime#77

Open
suehara wants to merge 68 commits intolcfiplus:masterfrom
suehara:master
Open

First implementation of exporting input variables and infer the ParT network with onnxruntime#77
suehara wants to merge 68 commits intolcfiplus:masterfrom
suehara:master

Conversation

@suehara
Copy link
Copy Markdown
Member

@suehara suehara commented Dec 10, 2025

BEGINRELEASENOTES
This is automatic generation written in ReleaseNotes.md

  • 2025-12-05 SUEHARA Taikan

    • Implement backward compatibility for MC-PFO assignment
      • Add default parameter to InitMCPPFOCollections for backward compatibility
      • Use simple method (first element) when Track-MC relation is not available
      • Use improved method (weight-based, multi-track support) when available
      • Maintain full backward compatibility with upstream v00-11
  • 2025-12-03 SUEHARA Taikan

    • Add optional ONNX support with backward compatibility
      • Enable ONNX Runtime support as optional feature (ENABLE_ONNX CMake option, default: OFF)
      • Make onnxruntime and nlohmann_json optional dependencies
      • Conditionally compile ONNX-related source files
      • Full backward compatibility when ENABLE_ONNX=OFF
  • SUEHARA Taikan and collaborators (2024-2025)

    • Machine Learning and ONNX integration

      • Add MLInputGenerator, MLMakeNtuple, MLInferenceWeaver for ML-based flavor tagging
      • Add WeaverInterface and ONNXRuntime for ONNX model inference
      • Add DNNProvider2 for DNN-based vertex finding
      • Implement event-based classification with jets
      • Add dEdx support for particle identification
    • Flavor tagging improvements

      • Improve PFA-track assignment and track-MC assignment
      • Implement true jet flavor assignment from MC
      • Add MC-to-jet assignment algorithm (AssignJetsToMC)
      • Bugfixes on MC flavor assignment
      • Add sorted track and neutral accessors (getAllTracksSorted, getNeutralsSorted)
    • Code quality and compatibility

      • Update C++ standard to C++17 with CMake 3.5+ requirement
      • Compatibility fixes for key4hep environment and onnxruntime
      • Various bugfixes in weaver output and neutral PF candidate masking
      • Add event-based input support

ENDRELEASENOTES

tomohikosan and others added 21 commits January 21, 2025 17:38
Enable ONNX Runtime support as an optional feature that can be enabled
via ENABLE_ONNX CMake option (default: OFF).

Changes:
- Add ENABLE_ONNX CMake option (default OFF) for backward compatibility
- Make onnxruntime and nlohmann_json optional dependencies
- Conditionally compile ONNX-related source files based on ENABLE_ONNX
- Conditionally include ONNX-related headers in ROOT dictionary
- Remove ONNX source files from build when ENABLE_ONNX=OFF

When ENABLE_ONNX=OFF (default):
- No ONNXRuntime dependency required
- ONNX-related files not compiled (ONNXRuntime.cc, MLInferenceWeaver.cc, etc.)
- Full backward compatibility with existing builds

When ENABLE_ONNX=ON:
- Requires onnxruntime and nlohmann_json
- Enables ML inference features via ONNX

Both build configurations tested successfully.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add default parameter to InitMCPPFOCollections to maintain backward
compatibility with upstream while supporting improved Track-MC relation
based assignment.

Changes:
- Add default parameter mctrkColName="" to InitMCPPFOCollections()
- Use simple method (first element) when Track-MC relation is not available
- Use improved method (weight-based, multi-track support) when Track-MC relation is available
- Maintain full backward compatibility with upstream v00-11

Implementation details:
- navTrks.size() == 0: Use simple PFO-MC relation[0] (upstream compatible)
- navTrks.size() > 0: Use improved Track[max Pt] -> MC[max weight] method

Benefits:
- No code changes required for existing users
- Gradual migration path via steering file parameter
- Same binary supports both modes
- Transparent fallback to compatible mode

Testing:
- Successfully built with ONNX disabled (26M library)
- Successfully built with ONNX enabled (34M library)
- Zero compilation errors
- API symbols correctly exported

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Merge official LCFIPlus v00-11 release with development branch containing
ML/ONNX support and flavor tagging improvements.

From upstream v00-11:
- Add key4hep-build CI workflow for automated testing
- Make building against ROOT 6.38 possible
- Explicitly include LCIO headers for better compatibility
- Fix exception handling (use reference instead of value)
- Backport key4hep-spack patches

From development branch:
- Optional ONNX Runtime support (ENABLE_ONNX CMake option, default: OFF)
- ML-based flavor tagging infrastructure (MLInputGenerator, MLMakeNtuple, MLInferenceWeaver)
- Improved MC-PFO assignment with backward compatibility
- DNN-based vertex finding (DNNProvider2, VertexFinderDNN)
- Event-based classification support
- Enhanced flavor tagging algorithms
- C++17 standard with CMake 3.5+ requirement

Merge resolution:
- Keep upstream CMAKE requirements where possible
- Restore ${LCIO_INCLUDE_DIRS} in ROOT_DICT_INCLUDE_DIRS (from upstream)
- Add ROOT_DICT_CINT_DEFINITIONS (from upstream)
- Maintain ONNX conditional compilation (development branch)
- Keep both upstream v00-11 and development additions in release notes
- Preserve key4hep CI workflow from upstream

Backward compatibility:
- Default behavior matches upstream when ONNX disabled
- Optional features enabled via CMake flags and steering parameters
- No breaking changes to existing physics logic

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Move "Development branch additions" section to the top of the release
notes, making it the first section readers see. This better reflects
the chronological order and highlights the most recent changes.

Changes:
- Move development branch section from subsection (##) to main section (#)
- Place it before v00-11 upstream release notes
- Improves readability by showing newest changes first

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…nderdnn

Replace unsafe memset() calls with proper C++ value-initialization.
- DNNProvider2.cc: Use DNNData() constructor instead of memset
- vertexfinderdnn.cc: Use TracksData() constructor instead of memset

This fixes compiler warnings about clearing objects with non-trivial
types (std::vector members) and ensures proper initialization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Comment out unused variable 'nall' in the process() method to clean up the code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have started to have a look here and I was wondering whether you could add some example steering files for

  • Running data collection for training
  • Running the inference with the trained model

That would also help me judge if all of the classes are in the end necessary. It's not entirely clear to me if all of the different ways of doing inference are necessary or if we could drop them.

There are a few comments below which I found while having a first quick look.

Comment on lines +1274 to +1292
vector<const Track*> tracks_unsorted = getAllTracks(withoutV0);

// order particles by energy
vector<std::pair<float, int> > order_tr;
order_tr.resize(tracks_unsorted.size());
for(size_t i=0; i<tracks_unsorted.size(); ++i){
order_tr[i] = std::pair<float, int>(tracks_unsorted[i]->E(), i);
}
std::sort(order_tr.begin(),order_tr.end(),[](std::pair<float,int>a,std::pair<float,int> b){
return a.first > b.first;
});

vector<const Track*> tracks;
tracks.resize(tracks_unsorted.size());

for (size_t i=0; i<tracks_unsorted.size(); ++i) {
tracks[i] = tracks_unsorted[order_tr[i].second];
}
return tracks;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vector<const Track*> tracks_unsorted = getAllTracks(withoutV0);
// order particles by energy
vector<std::pair<float, int> > order_tr;
order_tr.resize(tracks_unsorted.size());
for(size_t i=0; i<tracks_unsorted.size(); ++i){
order_tr[i] = std::pair<float, int>(tracks_unsorted[i]->E(), i);
}
std::sort(order_tr.begin(),order_tr.end(),[](std::pair<float,int>a,std::pair<float,int> b){
return a.first > b.first;
});
vector<const Track*> tracks;
tracks.resize(tracks_unsorted.size());
for (size_t i=0; i<tracks_unsorted.size(); ++i) {
tracks[i] = tracks_unsorted[order_tr[i].second];
}
return tracks;
auto tracks_unsorted = getAllTracks(withoutV0);
std::ranges::sort(tracks_unsorted, {}, &Track::E);

Since we are building with c++20 in any case, we can also use std::ranges::sort and not create an intermediate vector just for soting. Similar for the neutrals below.

Comment on lines +47 to +48
registerInputCollection(LCIO::LCRELATION, "MCTrackRelation", "Relation between MC and tracks, usually better in terms of assignment of tracks",
_mctrkRelationName, std::string(""));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these relations be persisted to files in the end? Maybe I am just missing it, but are the FromType and ToType parameters set on this collection so that it is easily possible to identify what is being linked in this collection?

Comment on lines +14 to +16
namespace MLInputGenerator {

extern map<string, variant<
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this a class and mark the map as static? As far as I can tell that should have the same effect?


// copied from FCCANalyses/analyzers/dataframe/src/ReconstructedParticle2Track.cc
float calc_dxy(float D0_wrt0, float Z0_wrt0, float phi0_wrt0, TVector3 p, TVector3 privtx, int charge){
double Bz = 3.5;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be an input parameter, and configurable somehow in the long run.

|| std::holds_alternative<function<double(const Track*, const Vertex*)> >(v.second)
|| std::holds_alternative<function<double(const Neutral*)> >(v.second)
|| std::holds_alternative<function<double(const Neutral*, const Vertex*)> >(v.second)){
_tree->Branch( key.c_str(), &_data.newDataVec(key) );
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this construction work properly? I remember in the past, adding a vector to a tree, that later has to re-allocated because it grows made the pointer stored here invalid. I don't see any explicit calls to reserve or resize here for these vectors that would make them "stable" when adding elements.

Comment on lines +134 to +137
if (_outEvent && _outEventNoJets) {
cout << "Skipping due to ambiguous setting: MLMakeNtuple.EventClassification and MLMakeNtuple.EventClassificationNoJets are both turned on" << endl;
return;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be checked in the initialization and it should also be a hard error already there, not something that will just create a lot of log messages but still continues to run.

Comment on lines +162 to +173
TrackVec &tracks_orig = event->getTracks();
NeutralVec &neutrals_orig = event->getNeutrals();

vector<const Track *> tracks(tracks_orig.size());
vector<const Neutral *> neutrals(neutrals_orig.size());

std::partial_sort_copy(tracks_orig.begin(),tracks_orig.end(),tracks.begin(), tracks.end(), [](const Track *a, const Track *b){
return a->E() > b->E();
});
std::partial_sort_copy(neutrals_orig.begin(),neutrals_orig.end(),neutrals.begin(), neutrals.end(),[](const Neutral *a, const Neutral *b){
return a->E() > b->E();
});
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to partial_sort_copy here? As far as I can see, tracks_orig and neutrals_orig are only used here and no longer after sorting has happened, and even though it says partial_sort here the effect will be that the whole vector will be sorted. sort is usually quicker than partial_sort. So I would propose something along the lines of

auto tracks = event->getTracks(); // NOTE: the explicit copy we do here by omitting '&'
std::ranges::sort(tracks, std::greater{}, &Track:E);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this header used anywhere? It looks like it can be removed?

Comment on lines +1 to +37
# Development branch additions (merged 2025-12-06)

* 2025-12-05 SUEHARA Taikan
- Implement backward compatibility for MC-PFO assignment
- Add default parameter to InitMCPPFOCollections for backward compatibility
- Use simple method (first element) when Track-MC relation is not available
- Use improved method (weight-based, multi-track support) when available
- Maintain full backward compatibility with upstream v00-11

* 2025-12-03 SUEHARA Taikan
- Add optional ONNX support with backward compatibility
- Enable ONNX Runtime support as optional feature (ENABLE_ONNX CMake option, default: OFF)
- Make onnxruntime and nlohmann_json optional dependencies
- Conditionally compile ONNX-related source files
- Full backward compatibility when ENABLE_ONNX=OFF

* SUEHARA Taikan and collaborators (2024-2025)
- Machine Learning and ONNX integration
- Add MLInputGenerator, MLMakeNtuple, MLInferenceWeaver for ML-based flavor tagging
- Add WeaverInterface and ONNXRuntime for ONNX model inference
- Add DNNProvider2 for DNN-based vertex finding
- Implement event-based classification with jets
- Add dEdx support for particle identification

- Flavor tagging improvements
- Improve PFA-track assignment and track-MC assignment
- Implement true jet flavor assignment from MC
- Add MC-to-jet assignment algorithm (AssignJetsToMC)
- Bugfixes on MC flavor assignment
- Add sorted track and neutral accessors (getAllTracksSorted, getNeutralsSorted)

- Code quality and compatibility
- Update C++ standard to C++17 with CMake 3.5+ requirement
- Compatibility fixes for key4hep environment and onnxruntime
- Various bugfixes in weaver output and neutral PF candidate masking
- Add event-based input support

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Development branch additions (merged 2025-12-06)
* 2025-12-05 SUEHARA Taikan
- Implement backward compatibility for MC-PFO assignment
- Add default parameter to InitMCPPFOCollections for backward compatibility
- Use simple method (first element) when Track-MC relation is not available
- Use improved method (weight-based, multi-track support) when available
- Maintain full backward compatibility with upstream v00-11
* 2025-12-03 SUEHARA Taikan
- Add optional ONNX support with backward compatibility
- Enable ONNX Runtime support as optional feature (ENABLE_ONNX CMake option, default: OFF)
- Make onnxruntime and nlohmann_json optional dependencies
- Conditionally compile ONNX-related source files
- Full backward compatibility when ENABLE_ONNX=OFF
* SUEHARA Taikan and collaborators (2024-2025)
- Machine Learning and ONNX integration
- Add MLInputGenerator, MLMakeNtuple, MLInferenceWeaver for ML-based flavor tagging
- Add WeaverInterface and ONNXRuntime for ONNX model inference
- Add DNNProvider2 for DNN-based vertex finding
- Implement event-based classification with jets
- Add dEdx support for particle identification
- Flavor tagging improvements
- Improve PFA-track assignment and track-MC assignment
- Implement true jet flavor assignment from MC
- Add MC-to-jet assignment algorithm (AssignJetsToMC)
- Bugfixes on MC flavor assignment
- Add sorted track and neutral accessors (getAllTracksSorted, getNeutralsSorted)
- Code quality and compatibility
- Update C++ standard to C++17 with CMake 3.5+ requirement
- Compatibility fixes for key4hep environment and onnxruntime
- Various bugfixes in weaver output and neutral PF candidate masking
- Add event-based input support

These will be automatically added by our tagging script if you put them between the BEGINRELEASENOTES and ENDRELEASENOTES in the PR as you have already done.

suehara and others added 8 commits December 25, 2025 15:04
Replace hardcoded 3.5 Tesla with Globals::Instance()->getBField()
in MLInputGenerator and DNNProvider2 to support different experimental
setups and allow user configuration via steering files.

Affected files:
- src/MLInputGenerator.cc (calc_dxy, calc_dz)
- src/DNNProvider2.cc (calc_dxy, calc_dz)

Addresses PR lcfiplus#77 review comment from tmadlener regarding hardcoded
magnetic field value that should be configurable.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Move EventClassification conflict check from process() to init()
to throw an error early instead of silently skipping events at runtime.

This ensures users are immediately notified of configuration errors
during initialization rather than discovering the issue after processing
begins.

Affected files:
- src/MLMakeNtuple.cc (init, process)

Addresses PR lcfiplus#77 review comment from tmadlener requesting that
ambiguous configuration settings should throw errors during init
rather than emit warnings at runtime.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace partial_sort_copy with direct std::sort for better
readability and performance. Use vector range constructor
instead of resize + partial_sort_copy for cleaner code.

Changes:
- src/MLMakeNtuple.cc: Use vector(begin, end) constructor and std::sort
- include/VertexFinderTearDown.h: Same improvement for Chi2 track sorting

This addresses PR lcfiplus#77 review comment suggesting to use direct sorting
instead of partial_sort_copy with intermediate vectors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add type information (MCParticle to Track/ReconstructedParticle)
to MCPFORelation and MCTrackRelation parameter descriptions
for better clarity.

Changes:
- src/LcfiplusProcessor.cc: Update parameter descriptions

Addresses PR lcfiplus#77 review comment requesting clearer type information
for relation collection parameters.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement EventStore flag management to enable ParticleID output for jets
read from LCIO files. This fixes the issue where jets imported from LCIO
were not written back with their assigned ParticleID values.

Changes:
- Add EventStore::AddFlags() and RemoveFlags() for runtime flag management
- Mark jet collections as PERSIST in MLInferenceWeaver and FlavorTag init()
- Update WriteJets() to add ParticleIDs to existing LCIO collections
- Maintain backward compatibility for internally created jets

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add optional parameter to create a new jet collection with ParticleID
instead of modifying jets read from LCIO files. This avoids errors
when trying to add ParticleID to read-only LCIO collections.

Changes:
- Add UpdateJetCollectionName parameter (default: empty string)
- When empty, use original behavior (add ParticleID to existing jets)
- When specified, create new jet collection with copied jets and ParticleID
- Use EventStore::Register() to create new collection with PERSIST flag
- Copy jets using Jet copy constructor without vertex extraction

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add sample steering files for ML inference and training data collection:
- ml_inference_test.xml: Example for running MLInferenceWeaver
- ml_training_data_collection.xml: Example for MLMakeNtuple data collection

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Refactor MLInputGenerator to use a static class with private members
instead of namespace-level variables. This improves encapsulation and
follows C++ best practices.

Changes:
- Convert calcInput map and _initialized flag to private static members
- Add public getCalcInput() accessor method
- Convert helper functions to public static methods
- Update MLMakeNtuple.cc and MLInferenceWeaver.cc to use new API

Addresses review comment: lcfiplus#77...

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants