incremental blame (rebased from #2457)#2625
Draft
Sebastian Thiel (Byron) wants to merge 8 commits into
Draft
Conversation
The `git commit-graph write` command also supports writing a separate section on the cache file that contains information about the paths changed between a commit and its first parent. This information can be used to significantly speed up the performance of some traversal operations, such as `git log -- <PATH>` and `git blame`. This commit teaches the git-commitgraph crate in gitoxide how to parse and access this information. We've only implemented support for reading v2 of this cache, because v1 is deprecated in Git as it can return bad results in some corner cases. The implementation is 100% compatible with Git itself; it uses the exact same version of murmur3 that Git is using, including the seed hashes.
Implement a gix_blame::incremental API that yelds the blame entries as they're discovered, similarly to Git's `git blame --incremental`. The implementation simply takes the original gix_blame::file and replaces the Vec of blame entries with a generic BlameSink trait. The original gix_blame::file is now implemented as a wrapper for gix_blame::incremental, by implementing the BlameSink trait on Vec<BlameEntry> and sorting + coalescing the entries before returning.
Use the new changed-path bloom filters from the commit graph to greatly speed up blame our implementation. Whenever we find a rejection on the bloom filter for the current path, we skip it altogether and pass the blame without diffing the trees.
Implement the log_file method in gitoxide-core, which allows performing path-delimited log commands. With the new changed paths bloom filter, it is not possible to perform this operation very efficiently.
Change `process_changes` to take `&[Change]` instead of `Vec<Change>`, eliminating the `changes.clone()` heap allocation at every call site. Replace the O(H×C) restart-from-beginning approach with a cursor that advances through the changes list across hunks. Non-suspect hunks are now skipped immediately. When the rare case of overlapping suspect ranges is detected (from merge blame convergence), the cursor safely resets to maintain correctness.
Compare the performance of the implementation with and without the
commit graph cache.
gix-blame::incremental/without-commit-graph
time: [14.852 s 14.895 s 14.944 s]
change: [+0.2968% +0.7623% +1.2529%] (p = 0.00 < 0.05)
Change within noise threshold.
gix-blame::incremental/with-commit-graph
time: [287.55 ms 290.30 ms 292.85 ms]
change: [−3.1181% −1.6720% −0.4502%] (p = 0.11 > 0.05)
No change in performance detected.
Signed-off-by: Vicent Marti <vmg@strn.cat>
The BlameSink type now returns a std::ops::ControlFlow value that can be used to interrupt the blame early. Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A rebased version of #2457, with automated conflict resolution.
This is a side-car PR, with the conversation on the original PR.