First implementation of exporting input variables and infer the ParT network with onnxruntime#77
First implementation of exporting input variables and infer the ParT network with onnxruntime#77suehara wants to merge 68 commits intolcfiplus:masterfrom
Conversation
… ML input variables
Enable ONNX Runtime support as an optional feature that can be enabled via ENABLE_ONNX CMake option (default: OFF). Changes: - Add ENABLE_ONNX CMake option (default OFF) for backward compatibility - Make onnxruntime and nlohmann_json optional dependencies - Conditionally compile ONNX-related source files based on ENABLE_ONNX - Conditionally include ONNX-related headers in ROOT dictionary - Remove ONNX source files from build when ENABLE_ONNX=OFF When ENABLE_ONNX=OFF (default): - No ONNXRuntime dependency required - ONNX-related files not compiled (ONNXRuntime.cc, MLInferenceWeaver.cc, etc.) - Full backward compatibility with existing builds When ENABLE_ONNX=ON: - Requires onnxruntime and nlohmann_json - Enables ML inference features via ONNX Both build configurations tested successfully. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add default parameter to InitMCPPFOCollections to maintain backward compatibility with upstream while supporting improved Track-MC relation based assignment. Changes: - Add default parameter mctrkColName="" to InitMCPPFOCollections() - Use simple method (first element) when Track-MC relation is not available - Use improved method (weight-based, multi-track support) when Track-MC relation is available - Maintain full backward compatibility with upstream v00-11 Implementation details: - navTrks.size() == 0: Use simple PFO-MC relation[0] (upstream compatible) - navTrks.size() > 0: Use improved Track[max Pt] -> MC[max weight] method Benefits: - No code changes required for existing users - Gradual migration path via steering file parameter - Same binary supports both modes - Transparent fallback to compatible mode Testing: - Successfully built with ONNX disabled (26M library) - Successfully built with ONNX enabled (34M library) - Zero compilation errors - API symbols correctly exported 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Merge official LCFIPlus v00-11 release with development branch containing
ML/ONNX support and flavor tagging improvements.
From upstream v00-11:
- Add key4hep-build CI workflow for automated testing
- Make building against ROOT 6.38 possible
- Explicitly include LCIO headers for better compatibility
- Fix exception handling (use reference instead of value)
- Backport key4hep-spack patches
From development branch:
- Optional ONNX Runtime support (ENABLE_ONNX CMake option, default: OFF)
- ML-based flavor tagging infrastructure (MLInputGenerator, MLMakeNtuple, MLInferenceWeaver)
- Improved MC-PFO assignment with backward compatibility
- DNN-based vertex finding (DNNProvider2, VertexFinderDNN)
- Event-based classification support
- Enhanced flavor tagging algorithms
- C++17 standard with CMake 3.5+ requirement
Merge resolution:
- Keep upstream CMAKE requirements where possible
- Restore ${LCIO_INCLUDE_DIRS} in ROOT_DICT_INCLUDE_DIRS (from upstream)
- Add ROOT_DICT_CINT_DEFINITIONS (from upstream)
- Maintain ONNX conditional compilation (development branch)
- Keep both upstream v00-11 and development additions in release notes
- Preserve key4hep CI workflow from upstream
Backward compatibility:
- Default behavior matches upstream when ONNX disabled
- Optional features enabled via CMake flags and steering parameters
- No breaking changes to existing physics logic
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Move "Development branch additions" section to the top of the release notes, making it the first section readers see. This better reflects the chronological order and highlights the most recent changes. Changes: - Move development branch section from subsection (##) to main section (#) - Place it before v00-11 upstream release notes - Improves readability by showing newest changes first 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…nderdnn Replace unsafe memset() calls with proper C++ value-initialization. - DNNProvider2.cc: Use DNNData() constructor instead of memset - vertexfinderdnn.cc: Use TracksData() constructor instead of memset This fixes compiler warnings about clearing objects with non-trivial types (std::vector members) and ensures proper initialization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Comment out unused variable 'nall' in the process() method to clean up the code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
tmadlener
left a comment
There was a problem hiding this comment.
I have started to have a look here and I was wondering whether you could add some example steering files for
- Running data collection for training
- Running the inference with the trained model
That would also help me judge if all of the classes are in the end necessary. It's not entirely clear to me if all of the different ways of doing inference are necessary or if we could drop them.
There are a few comments below which I found while having a first quick look.
| vector<const Track*> tracks_unsorted = getAllTracks(withoutV0); | ||
|
|
||
| // order particles by energy | ||
| vector<std::pair<float, int> > order_tr; | ||
| order_tr.resize(tracks_unsorted.size()); | ||
| for(size_t i=0; i<tracks_unsorted.size(); ++i){ | ||
| order_tr[i] = std::pair<float, int>(tracks_unsorted[i]->E(), i); | ||
| } | ||
| std::sort(order_tr.begin(),order_tr.end(),[](std::pair<float,int>a,std::pair<float,int> b){ | ||
| return a.first > b.first; | ||
| }); | ||
|
|
||
| vector<const Track*> tracks; | ||
| tracks.resize(tracks_unsorted.size()); | ||
|
|
||
| for (size_t i=0; i<tracks_unsorted.size(); ++i) { | ||
| tracks[i] = tracks_unsorted[order_tr[i].second]; | ||
| } | ||
| return tracks; |
There was a problem hiding this comment.
| vector<const Track*> tracks_unsorted = getAllTracks(withoutV0); | |
| // order particles by energy | |
| vector<std::pair<float, int> > order_tr; | |
| order_tr.resize(tracks_unsorted.size()); | |
| for(size_t i=0; i<tracks_unsorted.size(); ++i){ | |
| order_tr[i] = std::pair<float, int>(tracks_unsorted[i]->E(), i); | |
| } | |
| std::sort(order_tr.begin(),order_tr.end(),[](std::pair<float,int>a,std::pair<float,int> b){ | |
| return a.first > b.first; | |
| }); | |
| vector<const Track*> tracks; | |
| tracks.resize(tracks_unsorted.size()); | |
| for (size_t i=0; i<tracks_unsorted.size(); ++i) { | |
| tracks[i] = tracks_unsorted[order_tr[i].second]; | |
| } | |
| return tracks; | |
| auto tracks_unsorted = getAllTracks(withoutV0); | |
| std::ranges::sort(tracks_unsorted, {}, &Track::E); |
Since we are building with c++20 in any case, we can also use std::ranges::sort and not create an intermediate vector just for soting. Similar for the neutrals below.
src/LcfiplusProcessor.cc
Outdated
| registerInputCollection(LCIO::LCRELATION, "MCTrackRelation", "Relation between MC and tracks, usually better in terms of assignment of tracks", | ||
| _mctrkRelationName, std::string("")); |
There was a problem hiding this comment.
Will these relations be persisted to files in the end? Maybe I am just missing it, but are the FromType and ToType parameters set on this collection so that it is easily possible to identify what is being linked in this collection?
include/MLInputGenerator.h
Outdated
| namespace MLInputGenerator { | ||
|
|
||
| extern map<string, variant< |
There was a problem hiding this comment.
Why not make this a class and mark the map as static? As far as I can tell that should have the same effect?
src/MLInputGenerator.cc
Outdated
|
|
||
| // copied from FCCANalyses/analyzers/dataframe/src/ReconstructedParticle2Track.cc | ||
| float calc_dxy(float D0_wrt0, float Z0_wrt0, float phi0_wrt0, TVector3 p, TVector3 privtx, int charge){ | ||
| double Bz = 3.5; |
There was a problem hiding this comment.
This should probably be an input parameter, and configurable somehow in the long run.
| || std::holds_alternative<function<double(const Track*, const Vertex*)> >(v.second) | ||
| || std::holds_alternative<function<double(const Neutral*)> >(v.second) | ||
| || std::holds_alternative<function<double(const Neutral*, const Vertex*)> >(v.second)){ | ||
| _tree->Branch( key.c_str(), &_data.newDataVec(key) ); |
There was a problem hiding this comment.
Does this construction work properly? I remember in the past, adding a vector to a tree, that later has to re-allocated because it grows made the pointer stored here invalid. I don't see any explicit calls to reserve or resize here for these vectors that would make them "stable" when adding elements.
src/MLMakeNtuple.cc
Outdated
| if (_outEvent && _outEventNoJets) { | ||
| cout << "Skipping due to ambiguous setting: MLMakeNtuple.EventClassification and MLMakeNtuple.EventClassificationNoJets are both turned on" << endl; | ||
| return; | ||
| } |
There was a problem hiding this comment.
I think this should be checked in the initialization and it should also be a hard error already there, not something that will just create a lot of log messages but still continues to run.
| TrackVec &tracks_orig = event->getTracks(); | ||
| NeutralVec &neutrals_orig = event->getNeutrals(); | ||
|
|
||
| vector<const Track *> tracks(tracks_orig.size()); | ||
| vector<const Neutral *> neutrals(neutrals_orig.size()); | ||
|
|
||
| std::partial_sort_copy(tracks_orig.begin(),tracks_orig.end(),tracks.begin(), tracks.end(), [](const Track *a, const Track *b){ | ||
| return a->E() > b->E(); | ||
| }); | ||
| std::partial_sort_copy(neutrals_orig.begin(),neutrals_orig.end(),neutrals.begin(), neutrals.end(),[](const Neutral *a, const Neutral *b){ | ||
| return a->E() > b->E(); | ||
| }); |
There was a problem hiding this comment.
What is the reason to partial_sort_copy here? As far as I can see, tracks_orig and neutrals_orig are only used here and no longer after sorting has happened, and even though it says partial_sort here the effect will be that the whole vector will be sorted. sort is usually quicker than partial_sort. So I would propose something along the lines of
auto tracks = event->getTracks(); // NOTE: the explicit copy we do here by omitting '&'
std::ranges::sort(tracks, std::greater{}, &Track:E);
include/MLInferenceTorch.h
Outdated
There was a problem hiding this comment.
Is this header used anywhere? It looks like it can be removed?
| # Development branch additions (merged 2025-12-06) | ||
|
|
||
| * 2025-12-05 SUEHARA Taikan | ||
| - Implement backward compatibility for MC-PFO assignment | ||
| - Add default parameter to InitMCPPFOCollections for backward compatibility | ||
| - Use simple method (first element) when Track-MC relation is not available | ||
| - Use improved method (weight-based, multi-track support) when available | ||
| - Maintain full backward compatibility with upstream v00-11 | ||
|
|
||
| * 2025-12-03 SUEHARA Taikan | ||
| - Add optional ONNX support with backward compatibility | ||
| - Enable ONNX Runtime support as optional feature (ENABLE_ONNX CMake option, default: OFF) | ||
| - Make onnxruntime and nlohmann_json optional dependencies | ||
| - Conditionally compile ONNX-related source files | ||
| - Full backward compatibility when ENABLE_ONNX=OFF | ||
|
|
||
| * SUEHARA Taikan and collaborators (2024-2025) | ||
| - Machine Learning and ONNX integration | ||
| - Add MLInputGenerator, MLMakeNtuple, MLInferenceWeaver for ML-based flavor tagging | ||
| - Add WeaverInterface and ONNXRuntime for ONNX model inference | ||
| - Add DNNProvider2 for DNN-based vertex finding | ||
| - Implement event-based classification with jets | ||
| - Add dEdx support for particle identification | ||
|
|
||
| - Flavor tagging improvements | ||
| - Improve PFA-track assignment and track-MC assignment | ||
| - Implement true jet flavor assignment from MC | ||
| - Add MC-to-jet assignment algorithm (AssignJetsToMC) | ||
| - Bugfixes on MC flavor assignment | ||
| - Add sorted track and neutral accessors (getAllTracksSorted, getNeutralsSorted) | ||
|
|
||
| - Code quality and compatibility | ||
| - Update C++ standard to C++17 with CMake 3.5+ requirement | ||
| - Compatibility fixes for key4hep environment and onnxruntime | ||
| - Various bugfixes in weaver output and neutral PF candidate masking | ||
| - Add event-based input support | ||
|
|
There was a problem hiding this comment.
| # Development branch additions (merged 2025-12-06) | |
| * 2025-12-05 SUEHARA Taikan | |
| - Implement backward compatibility for MC-PFO assignment | |
| - Add default parameter to InitMCPPFOCollections for backward compatibility | |
| - Use simple method (first element) when Track-MC relation is not available | |
| - Use improved method (weight-based, multi-track support) when available | |
| - Maintain full backward compatibility with upstream v00-11 | |
| * 2025-12-03 SUEHARA Taikan | |
| - Add optional ONNX support with backward compatibility | |
| - Enable ONNX Runtime support as optional feature (ENABLE_ONNX CMake option, default: OFF) | |
| - Make onnxruntime and nlohmann_json optional dependencies | |
| - Conditionally compile ONNX-related source files | |
| - Full backward compatibility when ENABLE_ONNX=OFF | |
| * SUEHARA Taikan and collaborators (2024-2025) | |
| - Machine Learning and ONNX integration | |
| - Add MLInputGenerator, MLMakeNtuple, MLInferenceWeaver for ML-based flavor tagging | |
| - Add WeaverInterface and ONNXRuntime for ONNX model inference | |
| - Add DNNProvider2 for DNN-based vertex finding | |
| - Implement event-based classification with jets | |
| - Add dEdx support for particle identification | |
| - Flavor tagging improvements | |
| - Improve PFA-track assignment and track-MC assignment | |
| - Implement true jet flavor assignment from MC | |
| - Add MC-to-jet assignment algorithm (AssignJetsToMC) | |
| - Bugfixes on MC flavor assignment | |
| - Add sorted track and neutral accessors (getAllTracksSorted, getNeutralsSorted) | |
| - Code quality and compatibility | |
| - Update C++ standard to C++17 with CMake 3.5+ requirement | |
| - Compatibility fixes for key4hep environment and onnxruntime | |
| - Various bugfixes in weaver output and neutral PF candidate masking | |
| - Add event-based input support |
These will be automatically added by our tagging script if you put them between the BEGINRELEASENOTES and ENDRELEASENOTES in the PR as you have already done.
Replace hardcoded 3.5 Tesla with Globals::Instance()->getBField() in MLInputGenerator and DNNProvider2 to support different experimental setups and allow user configuration via steering files. Affected files: - src/MLInputGenerator.cc (calc_dxy, calc_dz) - src/DNNProvider2.cc (calc_dxy, calc_dz) Addresses PR lcfiplus#77 review comment from tmadlener regarding hardcoded magnetic field value that should be configurable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Move EventClassification conflict check from process() to init() to throw an error early instead of silently skipping events at runtime. This ensures users are immediately notified of configuration errors during initialization rather than discovering the issue after processing begins. Affected files: - src/MLMakeNtuple.cc (init, process) Addresses PR lcfiplus#77 review comment from tmadlener requesting that ambiguous configuration settings should throw errors during init rather than emit warnings at runtime. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace partial_sort_copy with direct std::sort for better readability and performance. Use vector range constructor instead of resize + partial_sort_copy for cleaner code. Changes: - src/MLMakeNtuple.cc: Use vector(begin, end) constructor and std::sort - include/VertexFinderTearDown.h: Same improvement for Chi2 track sorting This addresses PR lcfiplus#77 review comment suggesting to use direct sorting instead of partial_sort_copy with intermediate vectors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add type information (MCParticle to Track/ReconstructedParticle) to MCPFORelation and MCTrackRelation parameter descriptions for better clarity. Changes: - src/LcfiplusProcessor.cc: Update parameter descriptions Addresses PR lcfiplus#77 review comment requesting clearer type information for relation collection parameters. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement EventStore flag management to enable ParticleID output for jets read from LCIO files. This fixes the issue where jets imported from LCIO were not written back with their assigned ParticleID values. Changes: - Add EventStore::AddFlags() and RemoveFlags() for runtime flag management - Mark jet collections as PERSIST in MLInferenceWeaver and FlavorTag init() - Update WriteJets() to add ParticleIDs to existing LCIO collections - Maintain backward compatibility for internally created jets 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add optional parameter to create a new jet collection with ParticleID instead of modifying jets read from LCIO files. This avoids errors when trying to add ParticleID to read-only LCIO collections. Changes: - Add UpdateJetCollectionName parameter (default: empty string) - When empty, use original behavior (add ParticleID to existing jets) - When specified, create new jet collection with copied jets and ParticleID - Use EventStore::Register() to create new collection with PERSIST flag - Copy jets using Jet copy constructor without vertex extraction 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add sample steering files for ML inference and training data collection: - ml_inference_test.xml: Example for running MLInferenceWeaver - ml_training_data_collection.xml: Example for MLMakeNtuple data collection Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Refactor MLInputGenerator to use a static class with private members instead of namespace-level variables. This improves encapsulation and follows C++ best practices. Changes: - Convert calcInput map and _initialized flag to private static members - Add public getCalcInput() accessor method - Convert helper functions to public static methods - Update MLMakeNtuple.cc and MLInferenceWeaver.cc to use new API Addresses review comment: lcfiplus#77... Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
BEGINRELEASENOTES
This is automatic generation written in ReleaseNotes.md
2025-12-05 SUEHARA Taikan
2025-12-03 SUEHARA Taikan
SUEHARA Taikan and collaborators (2024-2025)
Machine Learning and ONNX integration
Flavor tagging improvements
Code quality and compatibility
ENDRELEASENOTES