Explain what you would like to see improved and how.
I had a short discussion with @vepadulano regarding one issue we see in RDataFrame.
We run on TTrees that we need to "join" with BuildIndex(). The problem is in our case the number of events we need to match is around 500M, this causes issues because the memory needed to keep the hash map for the matching is huge. This would be manageable on its own, but the problem is that each thread keeps the copy of the map. This results in our case of 40 threads using more than 120 GBs of memory (would need probably much more but this is the limitation of the hardware). As you can see this is pretty restrictive as the solution here is to either:
- Do not use that many threads
- Somehow split the files so you dont need to have a map of 500M entries
- "Just get more RAM"
These are not very compelling options.
We understand that this is probably beyond the scope of TTree and RDF support but this is something that could maybe improve for the RNtuple and RDF? As the current situation with TTrees is not sustainable
ROOT version
Any
Installation method
Any
Operating system
Any
Additional context
No response
Explain what you would like to see improved and how.
I had a short discussion with @vepadulano regarding one issue we see in RDataFrame.
We run on TTrees that we need to "join" with BuildIndex(). The problem is in our case the number of events we need to match is around 500M, this causes issues because the memory needed to keep the hash map for the matching is huge. This would be manageable on its own, but the problem is that each thread keeps the copy of the map. This results in our case of 40 threads using more than 120 GBs of memory (would need probably much more but this is the limitation of the hardware). As you can see this is pretty restrictive as the solution here is to either:
These are not very compelling options.
We understand that this is probably beyond the scope of TTree and RDF support but this is something that could maybe improve for the RNtuple and RDF? As the current situation with TTrees is not sustainable
ROOT version
Any
Installation method
Any
Operating system
Any
Additional context
No response