ML-based particle flow (MLPF) focuses on developing full event reconstruction for particle detectors using computationally scalable and flexible machine learning models. The project aims to improve particle flow reconstruction across various detector environments, including CMS, as well as future detectors via Key4HEP. We build on existing, open-source simulation software by the experimental collaborations.
Below is the development timeline of MLPF by our team, ranging from initial proofs of concept to full detector simulations and fine-tuning studies.
2021: First full-event GNN demonstration of MLPF
- Paper: MLPF: efficient machine-learned particle-flow reconstruction using graph neural networks (Eur. Phys. J. C)
- Focus: Initial idea with a GNN and scalable graph building.
- Code: v1.1
- Dataset: Zenodo Record
2021: First demonstration in CMS Run 3
- Paper: Machine Learning for Particle Flow Reconstruction at CMS (J. Phys. Conf. Ser.)
- Focus: First demonstration of feasibility within CMS.
- Detector Performance Note: CERN-CMS-DP-2021-030
2022: Improved performance in CMS Run 3
- Detector Performance Note: CERN-CMS-DP-2022-061
- Focus: We showed that training against a generator-level target can improve performance in CMS.
2024: Improved performance with full simulation for future colliders
- Paper: Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors (Communications Physics)
- Focus: Improved event-level performance in full simulation for future colliders.
- Code: v1.6.2
- Results: Zenodo Record
2025: Fine-tuning across detectors
- Paper: Fine-tuning machine-learned particle-flow reconstruction for new detector geometries in future colliders (Phys. Rev. D)
- Focus: Showing that the amount of training data can be reduced by 10x by fine-tuning.
- Code: v2.3.0
2026: CMS Run 3 full results
- Detector Performance Note: CERN-CMS-DP-2025-033
- Focus: Improve jet performance over baseline, first validation on real data.
- Paper: CMS Run 3 paper (submitted to EPJC)
- Code: v2.4.0
Please ensure you use the correct version of the jpata/particleflow software with the corresponding dataset version.
| Code Version | CMS Dataset | CLIC Dataset | CLD Dataset |
|---|---|---|---|
| 1.9.0 | 2.4.0 | 2.2.0 | NA |
| 2.0.0 | 2.4.0 | 2.3.0 | NA |
| 2.1.0 | 2.5.0 | 2.5.0 | NA |
| 2.2.0 | 2.5.0 | 2.5.0 | 2.5.0 |
| 2.3.0 | 2.5.0 | 2.5.0 | 2.5.0 |
| 2.4.0 | 2.6.0 | 2.5.0 | 2.5.0 |
The full data generation, model training, and validation workflow are managed using Pixi for environment management and Snakemake for job orchestration.
curl -fsSL https://pixi.sh/install.sh | bash
# Restart your shell or source your .bashrcConfigure the environment for your specific cluster. This sets up the necessary Snakemake profiles and site defaults.
- Tallinn (Slurm):
pixi run -e tallinn init- lxplus (HTCondor):
pixi run -e lxplus initGenerate the Snakefile for a production campaign corresponding to your site.
PROD=cms_run3 STEPS=gen,post,tfds,train pixi run -e lxplus generateYou can inspect snakemake_jobs/cms_run3/Snakefile and the related scripts to understand the workflow.
Launch the workflow on the batch system. It is recommended to run this inside a tmux or screen session.
PROD=cms_run3 STEPS=gen,post,tfds,train pixi run -e lxplus runTo run the validation plotting workflow:
PROD=cms_run3 pixi run -e lxplus validationYou are welcome to reuse the code in accordance with the LICENSE.
How to Cite
- Academic Work: Please cite the specific papers listed in the Publications section above relevant to the method you are using (e.g., initial GNN idea, fine-tuning, or specific detector studies).
- Code Usage: If you use the code significantly for research, please cite the specific tagged version from Zenodo.
- Dataset Usage: Cite the appropriate dataset via the Zenodo link and the corresponding paper.
Contact
For collaboration ideas that do not fit into the categories above, please get in touch via GitHub Discussions.
