[ntuple] Address some RNTupleProcessor performance bottlenecks#22593
[ntuple] Address some RNTupleProcessor performance bottlenecks#22593enirolf wants to merge 3 commits into
Conversation
| std::size_t currProcessorNumber = fCurrentProcessorNumber; | ||
| ROOT::NTupleSize_t entriesSeen = 0; | ||
| for (unsigned i = 0; i < currProcessorNumber; ++i) { | ||
| entriesSeen += fInnerProcessors[i]->GetNEntries(); | ||
| } |
There was a problem hiding this comment.
Not for this PR, but in principle we could have a cache vector of number of entries per processor which is filled lazily at discovery time whenever a processor needs to connect to file(s)
There was a problem hiding this comment.
Actually this is exactly what is done a few lines down and somehow didn't think to do it here, so thanks for pointing this out :D. Let me quickly add it here as well.
be1986b to
0c7ea38
Compare
724d342 to
d7d839b
Compare
Test Results 21 files 21 suites 3d 5h 44m 41s ⏱️ Results for commit d7d839b. |
| // If the requested entry number is lower than the current entry number, we have to again localise the correct local | ||
| // entry number starting from the first processor in the chain. Otherwise, we can continue looking from the inner | ||
| // processor that is currently connected, which is much faster when the chain consists of many inner processors. | ||
| if (entryNumber < fCurrentEntryNumber) { |
There was a problem hiding this comment.
Can't this be speed up in case the entryNumber is less than the fCurrentEntryNumber but more than the starting entry number of the current file? (and/or is the set of lengths cached and thus fast to go through again?)
This change prevents the unnecessary re-connection of inner RNTuples in a chain upon calling
LoadEntry, by first checking whether the requested entry is before or after the currently loaded entry.In addition, some unnecessary calls to
InitializeandConnectinRNTupleSingleProcessor::GetNEntrieshave been factored out.