Skip to content

commit-reach: use the decoration hash for tips_reachable_from_bases()#2116

Open
spkrka wants to merge 1 commit into
gitgitgadget:masterfrom
spkrka:tips-reachable-minimal
Open

commit-reach: use the decoration hash for tips_reachable_from_bases()#2116
spkrka wants to merge 1 commit into
gitgitgadget:masterfrom
spkrka:tips-reachable-minimal

Conversation

@spkrka
Copy link
Copy Markdown

@spkrka spkrka commented May 15, 2026

This is a single small commit that replaces an O(C*T) linear scan in
tips_reachable_from_bases() with an O(1) lookup using the decoration hash.

The function is called by git for-each-ref --merged and
git branch/tag --contains/--no-contains via reach_filter() in
ref-filter.c. On a merge-heavy monorepo with 2.3M commits and 10,000
refs, git for-each-ref --merged HEAD goes from 6.6s to 1.7s (4x).

The diff is intentionally minimal (+9/-6) to make the idea easy to
discuss before polishing. Things I'm not fully happy about:

  • Extra block scope { } just to preserve indentation of the inner body
  • Hacking the array index into the decoration value as (void *)(i + 1)
    instead of storing a proper pointer
  • Relying on unsigned wraparound (- 1 on a size_t 0) to check for
    not-found via j < tips_nr

Happy to clean all of these up in a follow-up commit if the approach
makes sense.

cc: Jeff King peff@peff.net

tips_reachable_from_bases() walks the commit graph from a set of base
commits to find which tip commits are reachable.  The inner loop does
a linear scan over the tips array to check whether each visited commit
is a tip, making the overall cost O(C * T) where C is commits walked
and T is the number of tips.

Replace the linear scan with the decoration hash for lookups, reducing
the per-commit tip check from O(T) to O(1) and the overall cost from
O(C * T) to O(C + T).

This function is called by `git for-each-ref --merged` and
`git branch/tag --contains/--no-contains` via reach_filter() in
ref-filter.c.

Benchmark on a merge-heavy monorepo (2.3M commits, 10,000 refs):

  Command                           Before    After   Speedup
  for-each-ref --merged HEAD        6.64s     1.66s     4.0x
  for-each-ref --no-merged HEAD     6.75s     1.74s     3.9x
  branch --merged HEAD              0.68s     0.61s      10%
  branch --no-merged HEAD           0.65s     0.61s       8%
  tag --merged HEAD                 0.12s     0.12s       -

The large speedup for for-each-ref is because it checks all 10,000
refs as tips, making the O(T) inner loop expensive.  The branch
subcommand only checks local branches (fewer tips), so the improvement
is smaller.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
@spkrka spkrka marked this pull request as ready for review May 15, 2026 18:06
@spkrka
Copy link
Copy Markdown
Author

spkrka commented May 15, 2026

/submit

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 15, 2026

Submitted as pull.2116.git.1778868463992.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2116/spkrka/tips-reachable-minimal-v1

To fetch this version to local tag pr-2116/spkrka/tips-reachable-minimal-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2116/spkrka/tips-reachable-minimal-v1

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 15, 2026

Jeff King wrote on the Git mailing list (how to reply to this email):

On Fri, May 15, 2026 at 06:07:43PM +0000, Kristofer Karlsson via GitGitGadget wrote:

> From: Kristofer Karlsson <krka@spotify.com>
> 
> tips_reachable_from_bases() walks the commit graph from a set of base
> commits to find which tip commits are reachable.  The inner loop does
> a linear scan over the tips array to check whether each visited commit
> is a tip, making the overall cost O(C * T) where C is commits walked
> and T is the number of tips.
> 
> Replace the linear scan with the decoration hash for lookups, reducing
> the per-commit tip check from O(T) to O(1) and the overall cost from
> O(C * T) to O(C + T).
> 
> This function is called by `git for-each-ref --merged` and
> `git branch/tag --contains/--no-contains` via reach_filter() in
> ref-filter.c.
> 
> Benchmark on a merge-heavy monorepo (2.3M commits, 10,000 refs):
> 
>   Command                           Before    After   Speedup
>   for-each-ref --merged HEAD        6.64s     1.66s     4.0x
>   for-each-ref --no-merged HEAD     6.75s     1.74s     3.9x
>   branch --merged HEAD              0.68s     0.61s      10%
>   branch --no-merged HEAD           0.65s     0.61s       8%
>   tag --merged HEAD                 0.12s     0.12s       -
> 
> The large speedup for for-each-ref is because it checks all 10,000
> refs as tips, making the O(T) inner loop expensive.  The branch
> subcommand only checks local branches (fewer tips), so the improvement
> is smaller.

Hmm, I couldn't reproduce the speedup on something like linux.git (~1.4M
commits) with a lot of synthetic branches. I'd think that old branches
would be the most expensive, so I did:

  old=$(git rev-list --reverse HEAD | head -n1)
  seq --format="update refs/heads/branch%g $old" 10000 |
  git update-ref --stdin

Running "git for-each-ref --no-merged HEAD" takes ~650ms with stock Git.
But with your patch, it goes to ~830ms!

So what am I missing about your repo that it is so slow in the first
place?

>      * Hacking the array index into the decoration value as (void *)(i + 1)
>        instead of storing a proper pointer

The decoration API is not the most generic option here. There's an
oidmap type, but you have to embed the hashmap bits into your struct,
which is a lot of boilerplate if you're just storing an int. You can
define a khash with a custom value type, and I think the existing
oid_pos uses an int, which might be enough. All of those will store an
extra copy of the oid, though for the sizes we're talking about that's
not the end of the world.

Since we're always mapping commits, you could define a commit-slab (each
commit struct gets a unique id which we then index into a big array).
See commit-slab.h for an example.

I'm not very familiar with this code, but I wonder if we actually need
to map at all. It looks like we are mostly interested in set inclusion,
so perhaps an oidset() would work. Or even a bit in the object-flags.

-Peff

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 15, 2026

User Jeff King <peff@peff.net> has been added to the cc: list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant