commit-reach: use the decoration hash for tips_reachable_from_bases()#2116
Open
spkrka wants to merge 1 commit into
Open
commit-reach: use the decoration hash for tips_reachable_from_bases()#2116spkrka wants to merge 1 commit into
spkrka wants to merge 1 commit into
Conversation
tips_reachable_from_bases() walks the commit graph from a set of base commits to find which tip commits are reachable. The inner loop does a linear scan over the tips array to check whether each visited commit is a tip, making the overall cost O(C * T) where C is commits walked and T is the number of tips. Replace the linear scan with the decoration hash for lookups, reducing the per-commit tip check from O(T) to O(1) and the overall cost from O(C * T) to O(C + T). This function is called by `git for-each-ref --merged` and `git branch/tag --contains/--no-contains` via reach_filter() in ref-filter.c. Benchmark on a merge-heavy monorepo (2.3M commits, 10,000 refs): Command Before After Speedup for-each-ref --merged HEAD 6.64s 1.66s 4.0x for-each-ref --no-merged HEAD 6.75s 1.74s 3.9x branch --merged HEAD 0.68s 0.61s 10% branch --no-merged HEAD 0.65s 0.61s 8% tag --merged HEAD 0.12s 0.12s - The large speedup for for-each-ref is because it checks all 10,000 refs as tips, making the O(T) inner loop expensive. The branch subcommand only checks local branches (fewer tips), so the improvement is smaller. Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Author
|
/submit |
|
Submitted as pull.2116.git.1778868463992.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
Jeff King wrote on the Git mailing list (how to reply to this email): On Fri, May 15, 2026 at 06:07:43PM +0000, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
>
> tips_reachable_from_bases() walks the commit graph from a set of base
> commits to find which tip commits are reachable. The inner loop does
> a linear scan over the tips array to check whether each visited commit
> is a tip, making the overall cost O(C * T) where C is commits walked
> and T is the number of tips.
>
> Replace the linear scan with the decoration hash for lookups, reducing
> the per-commit tip check from O(T) to O(1) and the overall cost from
> O(C * T) to O(C + T).
>
> This function is called by `git for-each-ref --merged` and
> `git branch/tag --contains/--no-contains` via reach_filter() in
> ref-filter.c.
>
> Benchmark on a merge-heavy monorepo (2.3M commits, 10,000 refs):
>
> Command Before After Speedup
> for-each-ref --merged HEAD 6.64s 1.66s 4.0x
> for-each-ref --no-merged HEAD 6.75s 1.74s 3.9x
> branch --merged HEAD 0.68s 0.61s 10%
> branch --no-merged HEAD 0.65s 0.61s 8%
> tag --merged HEAD 0.12s 0.12s -
>
> The large speedup for for-each-ref is because it checks all 10,000
> refs as tips, making the O(T) inner loop expensive. The branch
> subcommand only checks local branches (fewer tips), so the improvement
> is smaller.
Hmm, I couldn't reproduce the speedup on something like linux.git (~1.4M
commits) with a lot of synthetic branches. I'd think that old branches
would be the most expensive, so I did:
old=$(git rev-list --reverse HEAD | head -n1)
seq --format="update refs/heads/branch%g $old" 10000 |
git update-ref --stdin
Running "git for-each-ref --no-merged HEAD" takes ~650ms with stock Git.
But with your patch, it goes to ~830ms!
So what am I missing about your repo that it is so slow in the first
place?
> * Hacking the array index into the decoration value as (void *)(i + 1)
> instead of storing a proper pointer
The decoration API is not the most generic option here. There's an
oidmap type, but you have to embed the hashmap bits into your struct,
which is a lot of boilerplate if you're just storing an int. You can
define a khash with a custom value type, and I think the existing
oid_pos uses an int, which might be enough. All of those will store an
extra copy of the oid, though for the sizes we're talking about that's
not the end of the world.
Since we're always mapping commits, you could define a commit-slab (each
commit struct gets a unique id which we then index into a big array).
See commit-slab.h for an example.
I'm not very familiar with this code, but I wonder if we actually need
to map at all. It looks like we are mostly interested in set inclusion,
so perhaps an oidset() would work. Or even a bit in the object-flags.
-Peff |
|
User |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a single small commit that replaces an O(C*T) linear scan in
tips_reachable_from_bases()with an O(1) lookup using the decoration hash.The function is called by
git for-each-ref --mergedandgit branch/tag --contains/--no-containsviareach_filter()inref-filter.c. On a merge-heavy monorepo with 2.3M commits and 10,000
refs,
git for-each-ref --merged HEADgoes from 6.6s to 1.7s (4x).The diff is intentionally minimal (+9/-6) to make the idea easy to
discuss before polishing. Things I'm not fully happy about:
{ }just to preserve indentation of the inner body(void *)(i + 1)instead of storing a proper pointer
- 1on asize_t0) to check fornot-found via
j < tips_nrHappy to clean all of these up in a follow-up commit if the approach
makes sense.
cc: Jeff King peff@peff.net