From b767867fae892f15cdfd1466983fe4be46014a36 Mon Sep 17 00:00:00 2001 From: Julia Evans Date: Mon, 5 Jan 2026 16:48:15 -0500 Subject: [PATCH 01/50] doc: git-reset: reorder the forms From user feedback: three users commented that the `git reset [mode]` form is the one that they primarily use, and that they were suprised to see it listed last. ("I've never used git reset in any mode other than --hard"). Move it to be first, since the `git reset [mode]` form is what "Reset current HEAD to the specified state" at the beginning refers to, and because the `git reset [mode]` form is the only thing that `git reset` uniquely does, the others could also be done with `git restore`. Signed-off-by: Julia Evans Signed-off-by: Junio C Hamano Signed-off-by: D. Ben Knoble Signed-off-by: Junio C Hamano --- Documentation/git-reset.adoc | 58 ++++++++++++++++++------------------ 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/Documentation/git-reset.adoc b/Documentation/git-reset.adoc index 3b9ba9aee95203..9843682e81cb5e 100644 --- a/Documentation/git-reset.adoc +++ b/Documentation/git-reset.adoc @@ -8,43 +8,17 @@ git-reset - Reset current HEAD to the specified state SYNOPSIS -------- [synopsis] +git reset [--soft | --mixed [-N] | --hard | --merge | --keep] [-q] [] git reset [-q] [] [--] ... git reset [-q] [--pathspec-from-file= [--pathspec-file-nul]] [] git reset (--patch | -p) [] [--] [...] -git reset [--soft | --mixed [-N] | --hard | --merge | --keep] [-q] [] DESCRIPTION ----------- -In the first three forms, copy entries from __ to the index. -In the last form, set the current branch head (`HEAD`) to __, +In the first form, set the current branch head (`HEAD`) to __, optionally modifying index and working tree to match. The __/__ defaults to `HEAD` in all forms. - -`git reset [-q] [] [--] ...`:: -`git reset [-q] [--pathspec-from-file= [--pathspec-file-nul]] []`:: - These forms reset the index entries for all paths that match the - __ to their state at __. (It does not affect - the working tree or the current branch.) -+ -This means that `git reset ` is the opposite of `git add -`. This command is equivalent to -`git restore [--source=] --staged ...`. -+ -After running `git reset ` to update the index entry, you can -use linkgit:git-restore[1] to check the contents out of the index to -the working tree. Alternatively, using linkgit:git-restore[1] -and specifying a commit with `--source`, you -can copy the contents of a path out of a commit to the index and to the -working tree in one go. - -`git reset (--patch | -p) [] [--] [...]`:: - Interactively select hunks in the difference between the index - and __ (defaults to `HEAD`). The chosen hunks are applied - in reverse to the index. -+ -This means that `git reset -p` is the opposite of `git add -p`, i.e. -you can use it to selectively reset hunks. See the "Interactive Mode" -section of linkgit:git-add[1] to learn how to operate the `--patch` mode. +In the last three forms, copy entries from __ to the index. `git reset [] []`:: This form resets the current branch head to __ and @@ -98,6 +72,32 @@ but carries forward unmerged index entries. the submodules' `HEAD` to be detached at that commit. -- +`git reset [-q] [] [--] ...`:: +`git reset [-q] [--pathspec-from-file= [--pathspec-file-nul]] []`:: + These forms reset the index entries for all paths that match the + __ to their state at __. (It does not affect + the working tree or the current branch.) ++ +This means that `git reset ` is the opposite of `git add +`. This command is equivalent to +`git restore [--source=] --staged ...`. ++ +After running `git reset ` to update the index entry, you can +use linkgit:git-restore[1] to check the contents out of the index to +the working tree. Alternatively, using linkgit:git-restore[1] +and specifying a commit with `--source`, you +can copy the contents of a path out of a commit to the index and to the +working tree in one go. + +`git reset (--patch | -p) [] [--] [...]`:: + Interactively select hunks in the difference between the index + and __ (defaults to `HEAD`). The chosen hunks are applied + in reverse to the index. ++ +This means that `git reset -p` is the opposite of `git add -p`, i.e. +you can use it to selectively reset hunks. See the "Interactive Mode" +section of linkgit:git-add[1] to learn how to operate the `--patch` mode. + See "Reset, restore and revert" in linkgit:git[1] for the differences between the three commands. From 296834217d61b03e543863bd250c56a302a7ead2 Mon Sep 17 00:00:00 2001 From: Julia Evans Date: Mon, 5 Jan 2026 16:48:16 -0500 Subject: [PATCH 02/50] doc: git-reset: clarify intro From user feedback, there were several points of confusion: - What "tree-ish", "entries", "working tree", "HEAD", and "index" mean ("I have no clue what the index is", "I've been using git for 20 years and still don't know what a tree-ish is"). Avoid using these terms where it makes sense. - What "optionally modifying index and working tree to match" means ("to match what?" "optionally based on what?") Remove this from the intro, we can say it later when giving more details. - One user suggested that "The / defaults to HEAD in all forms." should be repeated later on, since it's easy to miss. Instead say that HEAD is the default in each case later. Another issue is that `git reset` consistently describes the action it does as "Reset ...", commands should not use their name to describe themselves, and that the word "mode" is used to mean several different things on this page. Address these by being more clear about two use cases for `git reset` ("to undo operations" and "to update staged files"), and explaining what the conditions are for each case instead of forcing the user to figure out the pattern is in first form vs the other 3 forms. Signed-off-by: Julia Evans Signed-off-by: Junio C Hamano Signed-off-by: D. Ben Knoble Signed-off-by: Junio C Hamano --- Documentation/git-reset.adoc | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/Documentation/git-reset.adoc b/Documentation/git-reset.adoc index 9843682e81cb5e..91dc6e62782fe2 100644 --- a/Documentation/git-reset.adoc +++ b/Documentation/git-reset.adoc @@ -3,7 +3,7 @@ git-reset(1) NAME ---- -git-reset - Reset current HEAD to the specified state +git-reset - Set `HEAD` or the index to a known state SYNOPSIS -------- @@ -15,10 +15,13 @@ git reset (--patch | -p) [] [--] [...] DESCRIPTION ----------- -In the first form, set the current branch head (`HEAD`) to __, -optionally modifying index and working tree to match. -The __/__ defaults to `HEAD` in all forms. -In the last three forms, copy entries from __ to the index. +`git reset` does either of the following: + +1. `git reset [] ` changes which commit `HEAD` points to. This + makes it possible to undo various Git operations, for example commit, merge, + rebase, and pull. +2. When you specify files or directories or pass `--patch`, `git reset` updates + the staged version of the specified files. `git reset [] []`:: This form resets the current branch head to __ and From 7fb080a790b6410f8e36dbc10c5d9be0ff47ce98 Mon Sep 17 00:00:00 2001 From: Julia Evans Date: Mon, 5 Jan 2026 16:48:17 -0500 Subject: [PATCH 03/50] doc: git-reset: clarify `git reset [mode]` From user feedback, there was some confusion about the differences between the modes, including: 1. Sometimes it says "index" and sometimes "index file". Fix by replacing "index file" with "index". 2. Many comments about not being able to understand what `--merge` does. Fix by mentioning obscure situations, since that seems to be what it's for. Most folks will use `git --abort`. 3. Issues telling the difference between --soft and --mixed, as well as --keep. Leave --keep alone because I couldn't understand its use case, but change `--soft` / `--mixed` / `--hard` as follows: --mixed is the default, so put it first. Describe --soft/--mixed/--hard with the following structure: * Start by saying what happens to the files in the working directory, because the thing users want to avoid most is irretrievably losing changes to their working directory files. * Then describe what happens to the staging area. Right now it seems to frame leaving the index alone as being a sort of neutral action. I think this is part of what's confusing users, because in Git when you update HEAD, Git almost always updates the index to match HEAD. So leaving the index unchanged while updating HEAD is actually quite unusual, and it deserves to be flagged. * Finally, give an example for --soft to explain a common use case. Signed-off-by: Julia Evans Signed-off-by: Junio C Hamano Signed-off-by: D. Ben Knoble Signed-off-by: Junio C Hamano --- Documentation/git-reset.adoc | 50 +++++++++++++++++++----------------- 1 file changed, 27 insertions(+), 23 deletions(-) diff --git a/Documentation/git-reset.adoc b/Documentation/git-reset.adoc index 91dc6e62782fe2..37c868ae24de9c 100644 --- a/Documentation/git-reset.adoc +++ b/Documentation/git-reset.adoc @@ -24,42 +24,46 @@ DESCRIPTION the staged version of the specified files. `git reset [] []`:: - This form resets the current branch head to __ and - possibly updates the index (resetting it to the tree of __) and - the working tree depending on __. Before the operation, `ORIG_HEAD` - is set to the tip of the current branch. If __ is omitted, - defaults to `--mixed`. The __ must be one of the following: + Set the current branch head (`HEAD`) to point at __. + Depending on __, also update the working directory and/or index + to match the contents of __. + __ defaults to `HEAD`. + Before the operation, `ORIG_HEAD` is set to the tip of the current branch. ++ +The __ must be one of the following (default `--mixed`): + --- -`--soft`:: - Does not touch the index file or the working tree at all (but - resets the head to __, just like all modes do). This leaves - all your changed files "Changes to be committed", as `git status` - would put it. +-- `--mixed`:: - Resets the index but not the working tree (i.e., the changed files - are preserved but not marked for commit) and reports what has not - been updated. This is the default action. + Leave your working directory unchanged. + Update the index to match the new `HEAD`, so nothing will be staged. + -If `-N` is specified, removed paths are marked as intent-to-add (see +If `-N` is specified, mark removed paths as intent-to-add (see linkgit:git-add[1]). +`--soft`:: + Leave your working tree files and the index unchanged. + For example, if you have no staged changes, you can use + `git reset --soft HEAD~5; git commit` + to combine the last 5 commits into 1 commit. This works even with + changes in the working tree, which are left untouched, but such usage + can lead to confusion. + `--hard`:: - Resets the index and working tree. Any changes to tracked files in the - working tree since __ are discarded. Any untracked files or - directories in the way of writing any tracked files are simply deleted. + Overwrite all files and directories with the version from __, + and may overwrite untracked files. Tracked files not in __ are + removed so that the working tree matches __. + Update the index to match the new `HEAD`, so nothing will be staged. `--merge`:: - Resets the index and updates the files in the working tree that are - different between __ and `HEAD`, but keeps those which are + Reset the index and update the files in the working tree that are + different between __ and `HEAD`, but keep those which are different between the index and working tree (i.e. which have changes which have not been added). + Mainly exists to reset unmerged index entries, like those left behind by + `git am -3` or `git switch -m` in certain situations. If a file that is different between __ and the index has unstaged changes, reset is aborted. -+ -In other words, `--merge` does something like a `git read-tree -u -m `, -but carries forward unmerged index entries. `--keep`:: Resets index entries and updates files in the working tree that are From 555c8464e5949fcd35ec9878f83874a998beb787 Mon Sep 17 00:00:00 2001 From: Julia Evans Date: Mon, 5 Jan 2026 16:48:18 -0500 Subject: [PATCH 04/50] doc: git-reset: clarify `git reset ` From user feedback: - Continued confusion about the terms "tree-ish" and "pathspec" - The word "hunks" is confusing folks, use "changes" instead. - On the part about `git restore`, there were a few comments to the effect of "wait, this doesn't actually update any files? What? Why?" Be more direct that `git reset` does not update files: there's no obvious reason to suggest that folks use `git reset` followed by `git restore`, instead suggest just using `git restore`. Continue avoiding the use of the word "reset" to describe what "git reset" does. Signed-off-by: Julia Evans Signed-off-by: Junio C Hamano Signed-off-by: D. Ben Knoble Signed-off-by: Junio C Hamano --- Documentation/git-reset.adoc | 28 ++++++++++++---------------- 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/Documentation/git-reset.adoc b/Documentation/git-reset.adoc index 37c868ae24de9c..5023b5069972ca 100644 --- a/Documentation/git-reset.adoc +++ b/Documentation/git-reset.adoc @@ -81,29 +81,25 @@ linkgit:git-add[1]). `git reset [-q] [] [--] ...`:: `git reset [-q] [--pathspec-from-file= [--pathspec-file-nul]] []`:: - These forms reset the index entries for all paths that match the - __ to their state at __. (It does not affect - the working tree or the current branch.) + For all specified files or directories, set the staged version to + the version from the given commit or tree (which defaults to `HEAD`). + This means that `git reset ` is the opposite of `git add -`. This command is equivalent to -`git restore [--source=] --staged ...`. +`: it unstages all changes to the specified file(s) or +directories. This is equivalent to `git restore --staged ...`. + -After running `git reset ` to update the index entry, you can -use linkgit:git-restore[1] to check the contents out of the index to -the working tree. Alternatively, using linkgit:git-restore[1] -and specifying a commit with `--source`, you -can copy the contents of a path out of a commit to the index and to the -working tree in one go. +In this mode, `git reset` updates only the index (without updating the `HEAD` or +working tree files). If you want to update the files as well as the index +entries, use linkgit:git-restore[1]. `git reset (--patch | -p) [] [--] [...]`:: - Interactively select hunks in the difference between the index - and __ (defaults to `HEAD`). The chosen hunks are applied - in reverse to the index. + Interactively select changes from the difference between the index + and the specified commit or tree (which defaults to `HEAD`). + The index is modified using the chosen changes. + This means that `git reset -p` is the opposite of `git add -p`, i.e. -you can use it to selectively reset hunks. See the "Interactive Mode" -section of linkgit:git-add[1] to learn how to operate the `--patch` mode. +you can use it to selectively unstage changes. See the "Interactive Mode" +section of linkgit:git-add[1] to learn how to use the `--patch` option. See "Reset, restore and revert" in linkgit:git[1] for the differences between the three commands. From 480336a9cec8701103a815289304ea626416043a Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Fri, 9 Jan 2026 09:33:09 +0100 Subject: [PATCH 05/50] packfile: create store via its owning source In subsequent patches we're about to move the packfile store from the object database layer into the object database source layer. Once done, we'll have one packfile store per source, where the source is owning the store. Prepare for this future and refactor `packfile_store_new()` to be initialized via an object database source instead of via the object database itself. This refactoring leads to a weird in-between state where the store is owned by the object database but created via the source. But this makes subsequent refactorings easier because we can now start to access the owning source of a given store. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- odb.c | 2 +- packfile.c | 20 ++++++++++---------- packfile.h | 6 +++--- 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/odb.c b/odb.c index 45b6600800560d..94144a69f532e9 100644 --- a/odb.c +++ b/odb.c @@ -1056,7 +1056,6 @@ struct object_database *odb_new(struct repository *repo, memset(o, 0, sizeof(*o)); o->repo = repo; - o->packfiles = packfile_store_new(o); pthread_mutex_init(&o->replace_mutex, NULL); string_list_init_dup(&o->submodule_source_paths); @@ -1065,6 +1064,7 @@ struct object_database *odb_new(struct repository *repo, o->sources = odb_source_new(o, primary_source, true); o->sources_tail = &o->sources->next; o->alternate_db = xstrdup_or_null(secondary_sources); + o->packfiles = packfile_store_new(o->sources); free(to_free); diff --git a/packfile.c b/packfile.c index c88bd92619913b..0a05a10daaccb1 100644 --- a/packfile.c +++ b/packfile.c @@ -876,7 +876,7 @@ struct packed_git *packfile_store_load_pack(struct packfile_store *store, p = strmap_get(&store->packs_by_path, key.buf); if (!p) { - p = add_packed_git(store->odb->repo, idx_path, + p = add_packed_git(store->source->odb->repo, idx_path, strlen(idx_path), local); if (p) packfile_store_add_pack(store, p); @@ -1068,8 +1068,8 @@ void packfile_store_prepare(struct packfile_store *store) if (store->initialized) return; - odb_prepare_alternates(store->odb); - for (source = store->odb->sources; source; source = source->next) { + odb_prepare_alternates(store->source->odb); + for (source = store->source->odb->sources; source; source = source->next) { prepare_multi_pack_index_one(source); prepare_packed_git_one(source); } @@ -1092,7 +1092,7 @@ struct packfile_list_entry *packfile_store_get_packs(struct packfile_store *stor { packfile_store_prepare(store); - for (struct odb_source *source = store->odb->sources; source; source = source->next) { + for (struct odb_source *source = store->source->odb->sources; source; source = source->next) { struct multi_pack_index *m = source->midx; if (!m) continue; @@ -2121,7 +2121,7 @@ int packfile_store_freshen_object(struct packfile_store *store, const struct object_id *oid) { struct pack_entry e; - if (!find_pack_entry(store->odb->repo, oid, &e)) + if (!find_pack_entry(store->source->odb->repo, oid, &e)) return 0; if (e.p->is_cruft) return 0; @@ -2142,7 +2142,7 @@ int packfile_store_read_object_info(struct packfile_store *store, struct pack_entry e; int rtype; - if (!find_pack_entry(store->odb->repo, oid, &e)) + if (!find_pack_entry(store->source->odb->repo, oid, &e)) return 1; /* @@ -2152,7 +2152,7 @@ int packfile_store_read_object_info(struct packfile_store *store, if (oi == &blank_oi) return 0; - rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi); + rtype = packed_object_info(store->source->odb->repo, e.p, e.offset, oi); if (rtype < 0) { mark_bad_packed_object(e.p, oid); return -1; @@ -2411,11 +2411,11 @@ int parse_pack_header_option(const char *in, unsigned char *out, unsigned int *l return 0; } -struct packfile_store *packfile_store_new(struct object_database *odb) +struct packfile_store *packfile_store_new(struct odb_source *source) { struct packfile_store *store; CALLOC_ARRAY(store, 1); - store->odb = odb; + store->source = source; strmap_init(&store->packs_by_path); return store; } @@ -2534,7 +2534,7 @@ int packfile_store_read_object_stream(struct odb_read_stream **out, if (packfile_store_read_object_info(store, oid, &oi, 0) || oi.u.packed.is_delta || - repo_settings_get_big_file_threshold(store->odb->repo) >= size) + repo_settings_get_big_file_threshold(store->source->odb->repo) >= size) return -1; in_pack_type = unpack_object_header(oi.u.packed.pack, diff --git a/packfile.h b/packfile.h index 59d162a3f415e5..33cc1c1654f081 100644 --- a/packfile.h +++ b/packfile.h @@ -77,7 +77,7 @@ struct packed_git *packfile_list_find_oid(struct packfile_list_entry *packs, * A store that manages packfiles for a given object database. */ struct packfile_store { - struct object_database *odb; + struct odb_source *source; /* * The list of packfiles in the order in which they have been most @@ -129,9 +129,9 @@ struct packfile_store { /* * Allocate and initialize a new empty packfile store for the given object - * database. + * database source. */ -struct packfile_store *packfile_store_new(struct object_database *odb); +struct packfile_store *packfile_store_new(struct odb_source *source); /* * Free the packfile store and all its associated state. All packfiles From 0316c63ca4fc0d58ecd02243c62253b246fd046a Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Fri, 9 Jan 2026 09:33:10 +0100 Subject: [PATCH 06/50] packfile: pass source to `prepare_pack()` When preparing a packfile we pass various pieces attached to the pack's object database source via the `struct prepare_pack_data`. Refactor this code to instead pass in the source directly. This reduces the number of variables we need to pass and allows for a subsequent refactoring where we start to prepare the pack via the source. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- packfile.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/packfile.c b/packfile.c index 0a05a10daaccb1..ab86afa01d7a99 100644 --- a/packfile.c +++ b/packfile.c @@ -975,10 +975,8 @@ void for_each_file_in_pack_dir(const char *objdir, } struct prepare_pack_data { - struct repository *r; + struct odb_source *source; struct string_list *garbage; - int local; - struct multi_pack_index *m; }; static void prepare_pack(const char *full_name, size_t full_name_len, @@ -988,10 +986,10 @@ static void prepare_pack(const char *full_name, size_t full_name_len, size_t base_len = full_name_len; if (strip_suffix_mem(full_name, &base_len, ".idx") && - !(data->m && midx_contains_pack(data->m, file_name))) { + !(data->source->midx && midx_contains_pack(data->source->midx, file_name))) { char *trimmed_path = xstrndup(full_name, full_name_len); - packfile_store_load_pack(data->r->objects->packfiles, - trimmed_path, data->local); + packfile_store_load_pack(data->source->odb->packfiles, + trimmed_path, data->source->local); free(trimmed_path); } @@ -1020,10 +1018,8 @@ static void prepare_packed_git_one(struct odb_source *source) { struct string_list garbage = STRING_LIST_INIT_DUP; struct prepare_pack_data data = { - .m = source->midx, - .r = source->odb->repo, + .source = source, .garbage = &garbage, - .local = source->local, }; for_each_file_in_pack_dir(source->path, prepare_pack, &data); From 085de91b951a40b2b8ce35f8bfa182d4f5bcea6b Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Fri, 9 Jan 2026 09:33:11 +0100 Subject: [PATCH 07/50] packfile: refactor kept-pack cache to work with packfile stores The kept pack cache is a cache of packfiles that are marked as kept either via an accompanying ".kept" file or via an in-memory flag. The cache can be retrieved via `kept_pack_cache()`, where one needs to pass in a repository. Ultimately though the kept-pack cache is a property of the packfile store, and this causes problems in a subsequent commit where we want to move down the packfile store to be a per-object-source entity. Prepare for this and refactor the kept-pack cache to work on top of a packfile store instead. While at it, rename both the function and flags specific to the kept-pack cache so that they can be properly attributed to the respective subsystems. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- builtin/pack-objects.c | 12 ++++++------ packfile.c | 37 ++++++++++++++++++++----------------- packfile.h | 25 +++++++++++++++++-------- reachable.c | 2 +- revision.c | 8 ++++---- 5 files changed, 48 insertions(+), 36 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 1ce8d6ee215326..e86b8f387a8c64 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1529,9 +1529,9 @@ static int want_cruft_object_mtime(struct repository *r, const struct object_id *oid, unsigned flags, uint32_t mtime) { - struct packed_git **cache; + struct packed_git **cache = packfile_store_get_kept_pack_cache(r->objects->packfiles, flags); - for (cache = kept_pack_cache(r, flags); *cache; cache++) { + for (; *cache; cache++) { struct packed_git *p = *cache; off_t ofs; uint32_t candidate_mtime; @@ -1624,9 +1624,9 @@ static int want_found_object(const struct object_id *oid, int exclude, */ unsigned flags = 0; if (ignore_packed_keep_on_disk) - flags |= ON_DISK_KEEP_PACKS; + flags |= KEPT_PACK_ON_DISK; if (ignore_packed_keep_in_core) - flags |= IN_CORE_KEEP_PACKS; + flags |= KEPT_PACK_IN_CORE; /* * If the object is in a pack that we want to ignore, *and* we @@ -3931,7 +3931,7 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked) * an optimization during delta selection. */ revs.no_kept_objects = 1; - revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS; + revs.keep_pack_cache_flags |= KEPT_PACK_IN_CORE; revs.blob_objects = 1; revs.tree_objects = 1; revs.tag_objects = 1; @@ -4030,7 +4030,7 @@ static void show_cruft_commit(struct commit *commit, void *data) static int cruft_include_check_obj(struct object *obj, void *data UNUSED) { - return !has_object_kept_pack(to_pack.repo, &obj->oid, IN_CORE_KEEP_PACKS); + return !has_object_kept_pack(to_pack.repo, &obj->oid, KEPT_PACK_IN_CORE); } static int cruft_include_check(struct commit *commit, void *data) diff --git a/packfile.c b/packfile.c index ab86afa01d7a99..191344eb1cf346 100644 --- a/packfile.c +++ b/packfile.c @@ -2164,25 +2164,26 @@ int packfile_store_read_object_info(struct packfile_store *store, return 0; } -static void maybe_invalidate_kept_pack_cache(struct repository *r, +static void maybe_invalidate_kept_pack_cache(struct packfile_store *store, unsigned flags) { - if (!r->objects->packfiles->kept_cache.packs) + if (!store->kept_cache.packs) return; - if (r->objects->packfiles->kept_cache.flags == flags) + if (store->kept_cache.flags == flags) return; - FREE_AND_NULL(r->objects->packfiles->kept_cache.packs); - r->objects->packfiles->kept_cache.flags = 0; + FREE_AND_NULL(store->kept_cache.packs); + store->kept_cache.flags = 0; } -struct packed_git **kept_pack_cache(struct repository *r, unsigned flags) +struct packed_git **packfile_store_get_kept_pack_cache(struct packfile_store *store, + unsigned flags) { - maybe_invalidate_kept_pack_cache(r, flags); + maybe_invalidate_kept_pack_cache(store, flags); - if (!r->objects->packfiles->kept_cache.packs) { + if (!store->kept_cache.packs) { struct packed_git **packs = NULL; + struct packfile_list_entry *e; size_t nr = 0, alloc = 0; - struct packed_git *p; /* * We want "all" packs here, because we need to cover ones that @@ -2192,9 +2193,11 @@ struct packed_git **kept_pack_cache(struct repository *r, unsigned flags) * covers, one kept and one not kept, but the midx returns only * the non-kept version. */ - repo_for_each_pack(r, p) { - if ((p->pack_keep && (flags & ON_DISK_KEEP_PACKS)) || - (p->pack_keep_in_core && (flags & IN_CORE_KEEP_PACKS))) { + for (e = packfile_store_get_packs(store); e; e = e->next) { + struct packed_git *p = e->pack; + + if ((p->pack_keep && (flags & KEPT_PACK_ON_DISK)) || + (p->pack_keep_in_core && (flags & KEPT_PACK_IN_CORE))) { ALLOC_GROW(packs, nr + 1, alloc); packs[nr++] = p; } @@ -2202,11 +2205,11 @@ struct packed_git **kept_pack_cache(struct repository *r, unsigned flags) ALLOC_GROW(packs, nr + 1, alloc); packs[nr] = NULL; - r->objects->packfiles->kept_cache.packs = packs; - r->objects->packfiles->kept_cache.flags = flags; + store->kept_cache.packs = packs; + store->kept_cache.flags = flags; } - return r->objects->packfiles->kept_cache.packs; + return store->kept_cache.packs; } int find_kept_pack_entry(struct repository *r, @@ -2214,9 +2217,9 @@ int find_kept_pack_entry(struct repository *r, unsigned flags, struct pack_entry *e) { - struct packed_git **cache; + struct packed_git **cache = packfile_store_get_kept_pack_cache(r->objects->packfiles, flags); - for (cache = kept_pack_cache(r, flags); *cache; cache++) { + for (; *cache; cache++) { struct packed_git *p = *cache; if (fill_pack_entry(oid, e, p)) return 1; diff --git a/packfile.h b/packfile.h index 33cc1c1654f081..410f85f03d7710 100644 --- a/packfile.h +++ b/packfile.h @@ -90,9 +90,10 @@ struct packfile_store { * is an on-disk ".keep" file or because they are marked as "kept" in * memory. * - * Should not be accessed directly, but via `kept_pack_cache()`. The - * list of packs gets invalidated when the stored flags and the flags - * passed to `kept_pack_cache()` mismatch. + * Should not be accessed directly, but via + * `packfile_store_get_kept_pack_cache()`. The list of packs gets + * invalidated when the stored flags and the flags passed to + * `packfile_store_get_kept_pack_cache()` mismatch. */ struct { struct packed_git **packs; @@ -210,6 +211,19 @@ struct packed_git *packfile_store_load_pack(struct packfile_store *store, int packfile_store_freshen_object(struct packfile_store *store, const struct object_id *oid); +enum kept_pack_type { + KEPT_PACK_ON_DISK = (1 << 0), + KEPT_PACK_IN_CORE = (1 << 1), +}; + +/* + * Retrieve the cache of kept packs from the given packfile store. Accepts a + * combination of `kept_pack_type` flags. The cache is computed on demand and + * will be recomputed whenever the flags change. + */ +struct packed_git **packfile_store_get_kept_pack_cache(struct packfile_store *store, + unsigned flags); + struct pack_window { struct pack_window *next; unsigned char *base; @@ -385,9 +399,6 @@ int packed_object_info(struct repository *r, void mark_bad_packed_object(struct packed_git *, const struct object_id *); const struct packed_git *has_packed_and_bad(struct repository *, const struct object_id *); -#define ON_DISK_KEEP_PACKS 1 -#define IN_CORE_KEEP_PACKS 2 - /* * Iff a pack file in the given repository contains the object named by sha1, * return true and store its location to e. @@ -398,8 +409,6 @@ int has_object_pack(struct repository *r, const struct object_id *oid); int has_object_kept_pack(struct repository *r, const struct object_id *oid, unsigned flags); -struct packed_git **kept_pack_cache(struct repository *r, unsigned flags); - /* * Return 1 if an object in a promisor packfile is or refers to the given * object, 0 otherwise. diff --git a/reachable.c b/reachable.c index b753c395530b6d..4b532039d5f84f 100644 --- a/reachable.c +++ b/reachable.c @@ -242,7 +242,7 @@ static int want_recent_object(struct recent_data *data, const struct object_id *oid) { if (data->ignore_in_core_kept_packs && - has_object_kept_pack(data->revs->repo, oid, IN_CORE_KEEP_PACKS)) + has_object_kept_pack(data->revs->repo, oid, KEPT_PACK_IN_CORE)) return 0; return 1; } diff --git a/revision.c b/revision.c index 5f0850ae5c9c1a..64d223a7c683f0 100644 --- a/revision.c +++ b/revision.c @@ -2541,14 +2541,14 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg die(_("--unpacked= no longer supported")); } else if (!strcmp(arg, "--no-kept-objects")) { revs->no_kept_objects = 1; - revs->keep_pack_cache_flags |= IN_CORE_KEEP_PACKS; - revs->keep_pack_cache_flags |= ON_DISK_KEEP_PACKS; + revs->keep_pack_cache_flags |= KEPT_PACK_IN_CORE; + revs->keep_pack_cache_flags |= KEPT_PACK_ON_DISK; } else if (skip_prefix(arg, "--no-kept-objects=", &optarg)) { revs->no_kept_objects = 1; if (!strcmp(optarg, "in-core")) - revs->keep_pack_cache_flags |= IN_CORE_KEEP_PACKS; + revs->keep_pack_cache_flags |= KEPT_PACK_IN_CORE; if (!strcmp(optarg, "on-disk")) - revs->keep_pack_cache_flags |= ON_DISK_KEEP_PACKS; + revs->keep_pack_cache_flags |= KEPT_PACK_ON_DISK; } else if (!strcmp(arg, "-r")) { revs->diff = 1; revs->diffopt.flags.recursive = 1; From eb9ec52d95b74d80dd64190de1349aab740990c9 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Fri, 9 Jan 2026 09:33:12 +0100 Subject: [PATCH 08/50] packfile: refactor misleading code when unusing pack windows The function `unuse_one_window()` is responsible for unmapping one of the packfile windows, which is done when we have exceeded the allowed number of window. The function receives a `struct packed_git` as input, which serves as an additional packfile that should be considered to be closed. If not given, we seemingly skip that and instead go through all of the repository's packfiles. The conditional that checks whether we have a packfile though does not make much sense anymore, as we dereference the packfile regardless of whether or not it is a `NULL` pointer to derive the repository's packfile store. The function was originally introduced via f0e17e86e1 (pack: move release_pack_memory(), 2017-08-18), and here we indeed had a caller that passed a `NULL` pointer. That caller was later removed via 9827d4c185 (packfile: drop release_pack_memory(), 2019-08-12), so starting with that commit we always pass a `struct packed_git`. In 9c5ce06d74 (packfile: use `repository` from `packed_git` directly, 2024-12-03) we then inadvertently started to rely on the fact that the pointer is never `NULL` because we use it now to identify the repository. Arguably, it didn't really make sense in the first place that the caller provides a packfile, as the selected window would have been overridden anyway by the subsequent loop over all packfiles if there was an older window. So the overall logic is quite misleading overall. The only case where it _could_ make a difference is when there were two packfiles with the same `last_used` value, but that case doesn't ever happen because the `pack_used_ctr` is strictly increasing. Refactor the code so that we instead pass in the object database to help make the code less misleading. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- packfile.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/packfile.c b/packfile.c index 191344eb1cf346..37006124650de9 100644 --- a/packfile.c +++ b/packfile.c @@ -355,16 +355,15 @@ static void scan_windows(struct packed_git *p, } } -static int unuse_one_window(struct packed_git *current) +static int unuse_one_window(struct object_database *odb) { struct packfile_list_entry *e; struct packed_git *lru_p = NULL; struct pack_window *lru_w = NULL, *lru_l = NULL; - if (current) - scan_windows(current, &lru_p, &lru_w, &lru_l); - for (e = current->repo->objects->packfiles->packs.head; e; e = e->next) + for (e = odb->packfiles->packs.head; e; e = e->next) scan_windows(e->pack, &lru_p, &lru_w, &lru_l); + if (lru_p) { munmap(lru_w->base, lru_w->len); pack_mapped -= lru_w->len; @@ -740,8 +739,8 @@ unsigned char *use_pack(struct packed_git *p, win->len = (size_t)len; pack_mapped += win->len; - while (settings->packed_git_limit < pack_mapped - && unuse_one_window(p)) + while (settings->packed_git_limit < pack_mapped && + unuse_one_window(p->repo->objects)) ; /* nothing */ win->base = xmmap_gently(NULL, win->len, PROT_READ, MAP_PRIVATE, From 84f0e60b28de69d1ccb7a51b729af6202b6cf4c8 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Fri, 9 Jan 2026 09:33:13 +0100 Subject: [PATCH 09/50] packfile: move packfile store into object source The packfile store is a member of `struct object_database`, which means that we have a single store per database. This doesn't really make much sense though: each source connected to the database has its own set of packfiles, so there is a conceptual mismatch here. This hasn't really caused much of a problem in the past, but with the advent of pluggable object databases this is becoming more of a problem because some of the sources may not even use packfiles in the first place. Move the packfile store down by one level from the object database into the object database source. This ensures that each source now has its own packfile store, and we can eventually start to abstract it away entirely so that the caller doesn't even know what kind of store it uses. Note that we only need to adjust a relatively small number of callers, way less than one might expect. This is because most callers are using `repo_for_each_pack()`, which handles enumeration of all packfiles that exist in the repository. So for now, none of these callers need to be adapted. The remaining callers that iterate through the packfiles directly and that need adjustment are those that are a bit more tangled with packfiles. These will be adjusted over time. Note that this patch only moves the packfile store, and there is still a bunch of functions that seemingly operate on a packfile store but that end up iterating over all sources. These will be adjusted in subsequent commits. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- builtin/fast-import.c | 37 +++++++----- builtin/grep.c | 6 +- builtin/index-pack.c | 2 +- builtin/pack-objects.c | 96 ++++++++++++++++--------------- http.c | 2 +- midx.c | 5 +- odb.c | 36 ++++++------ odb.h | 6 +- odb/streaming.c | 9 ++- packfile.c | 127 ++++++++++++++++++++++++++--------------- packfile.h | 62 +++++++++++++++++--- 11 files changed, 243 insertions(+), 145 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 7849005ccb15ff..b8a7757cfda943 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -900,7 +900,7 @@ static void end_packfile(void) idx_name = keep_pack(create_index()); /* Register the packfile with core git's machinery. */ - new_p = packfile_store_load_pack(pack_data->repo->objects->packfiles, + new_p = packfile_store_load_pack(pack_data->repo->objects->sources->packfiles, idx_name, 1); if (!new_p) die(_("core Git rejected index %s"), idx_name); @@ -955,7 +955,7 @@ static int store_object( struct object_id *oidout, uintmax_t mark) { - struct packfile_store *packs = the_repository->objects->packfiles; + struct odb_source *source; void *out, *delta; struct object_entry *e; unsigned char hdr[96]; @@ -979,7 +979,11 @@ static int store_object( if (e->idx.offset) { duplicate_count_by_type[type]++; return 1; - } else if (packfile_list_find_oid(packfile_store_get_packs(packs), &oid)) { + } + + for (source = the_repository->objects->sources; source; source = source->next) { + if (!packfile_list_find_oid(packfile_store_get_packs(source->packfiles), &oid)) + continue; e->type = type; e->pack_id = MAX_PACK_ID; e->idx.offset = 1; /* just not zero! */ @@ -1096,10 +1100,10 @@ static void truncate_pack(struct hashfile_checkpoint *checkpoint) static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) { - struct packfile_store *packs = the_repository->objects->packfiles; size_t in_sz = 64 * 1024, out_sz = 64 * 1024; unsigned char *in_buf = xmalloc(in_sz); unsigned char *out_buf = xmalloc(out_sz); + struct odb_source *source; struct object_entry *e; struct object_id oid; unsigned long hdrlen; @@ -1179,24 +1183,29 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) if (e->idx.offset) { duplicate_count_by_type[OBJ_BLOB]++; truncate_pack(&checkpoint); + goto out; + } - } else if (packfile_list_find_oid(packfile_store_get_packs(packs), &oid)) { + for (source = the_repository->objects->sources; source; source = source->next) { + if (!packfile_list_find_oid(packfile_store_get_packs(source->packfiles), &oid)) + continue; e->type = OBJ_BLOB; e->pack_id = MAX_PACK_ID; e->idx.offset = 1; /* just not zero! */ duplicate_count_by_type[OBJ_BLOB]++; truncate_pack(&checkpoint); - - } else { - e->depth = 0; - e->type = OBJ_BLOB; - e->pack_id = pack_id; - e->idx.offset = offset; - e->idx.crc32 = crc32_end(pack_file); - object_count++; - object_count_by_type[OBJ_BLOB]++; + goto out; } + e->depth = 0; + e->type = OBJ_BLOB; + e->pack_id = pack_id; + e->idx.offset = offset; + e->idx.crc32 = crc32_end(pack_file); + object_count++; + object_count_by_type[OBJ_BLOB]++; + +out: free(in_buf); free(out_buf); } diff --git a/builtin/grep.c b/builtin/grep.c index 53cccf2d25068c..4855b871ddf368 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -1213,8 +1213,12 @@ int cmd_grep(int argc, */ if (recurse_submodules) repo_read_gitmodules(the_repository, 1); + /* + * Note: `packfile_store_prepare()` prepares stores from all + * sources. This will be fixed in a subsequent commit. + */ if (startup_info->have_repository) - packfile_store_prepare(the_repository->objects->packfiles); + packfile_store_prepare(the_repository->objects->sources->packfiles); start_threads(&opt); } else { diff --git a/builtin/index-pack.c b/builtin/index-pack.c index a7e901e49c06d4..b67fb0256cc831 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1638,7 +1638,7 @@ static void final(const char *final_pack_name, const char *curr_pack_name, hash, "idx", 1); if (do_fsck_object && startup_info->have_repository) - packfile_store_load_pack(the_repository->objects->packfiles, + packfile_store_load_pack(the_repository->objects->sources->packfiles, final_index_name, 0); if (!from_stdin) { diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index e86b8f387a8c64..7fd90a99963fea 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1529,49 +1529,53 @@ static int want_cruft_object_mtime(struct repository *r, const struct object_id *oid, unsigned flags, uint32_t mtime) { - struct packed_git **cache = packfile_store_get_kept_pack_cache(r->objects->packfiles, flags); + struct odb_source *source; - for (; *cache; cache++) { - struct packed_git *p = *cache; - off_t ofs; - uint32_t candidate_mtime; + for (source = r->objects->sources; source; source = source->next) { + struct packed_git **cache = packfile_store_get_kept_pack_cache(source->packfiles, flags); - ofs = find_pack_entry_one(oid, p); - if (!ofs) - continue; + for (; *cache; cache++) { + struct packed_git *p = *cache; + off_t ofs; + uint32_t candidate_mtime; - /* - * We have a copy of the object 'oid' in a non-cruft - * pack. We can avoid packing an additional copy - * regardless of what the existing copy's mtime is since - * it is outside of a cruft pack. - */ - if (!p->is_cruft) - return 0; - - /* - * If we have a copy of the object 'oid' in a cruft - * pack, then either read the cruft pack's mtime for - * that object, or, if that can't be loaded, assume the - * pack's mtime itself. - */ - if (!load_pack_mtimes(p)) { - uint32_t pos; - if (offset_to_pack_pos(p, ofs, &pos) < 0) + ofs = find_pack_entry_one(oid, p); + if (!ofs) continue; - candidate_mtime = nth_packed_mtime(p, pos); - } else { - candidate_mtime = p->mtime; - } - /* - * We have a surviving copy of the object in a cruft - * pack whose mtime is greater than or equal to the one - * we are considering. We can thus avoid packing an - * additional copy of that object. - */ - if (mtime <= candidate_mtime) - return 0; + /* + * We have a copy of the object 'oid' in a non-cruft + * pack. We can avoid packing an additional copy + * regardless of what the existing copy's mtime is since + * it is outside of a cruft pack. + */ + if (!p->is_cruft) + return 0; + + /* + * If we have a copy of the object 'oid' in a cruft + * pack, then either read the cruft pack's mtime for + * that object, or, if that can't be loaded, assume the + * pack's mtime itself. + */ + if (!load_pack_mtimes(p)) { + uint32_t pos; + if (offset_to_pack_pos(p, ofs, &pos) < 0) + continue; + candidate_mtime = nth_packed_mtime(p, pos); + } else { + candidate_mtime = p->mtime; + } + + /* + * We have a surviving copy of the object in a cruft + * pack whose mtime is greater than or equal to the one + * we are considering. We can thus avoid packing an + * additional copy of that object. + */ + if (mtime <= candidate_mtime) + return 0; + } } return -1; @@ -1749,13 +1753,15 @@ static int want_object_in_pack_mtime(const struct object_id *oid, } } - for (e = the_repository->objects->packfiles->packs.head; e; e = e->next) { - struct packed_git *p = e->pack; - want = want_object_in_pack_one(p, oid, exclude, found_pack, found_offset, found_mtime); - if (!exclude && want > 0) - packfile_list_prepend(&the_repository->objects->packfiles->packs, p); - if (want != -1) - return want; + for (source = the_repository->objects->sources; source; source = source->next) { + for (e = source->packfiles->packs.head; e; e = e->next) { + struct packed_git *p = e->pack; + want = want_object_in_pack_one(p, oid, exclude, found_pack, found_offset, found_mtime); + if (!exclude && want > 0) + packfile_list_prepend(&source->packfiles->packs, p); + if (want != -1) + return want; + } } if (uri_protocols.nr) { diff --git a/http.c b/http.c index 41f850db16d19f..7815f144de3d69 100644 --- a/http.c +++ b/http.c @@ -2544,7 +2544,7 @@ void http_install_packfile(struct packed_git *p, struct packfile_list *list_to_remove_from) { packfile_list_remove(list_to_remove_from, p); - packfile_store_add_pack(the_repository->objects->packfiles, p); + packfile_store_add_pack(the_repository->objects->sources->packfiles, p); } struct http_pack_request *new_http_pack_request( diff --git a/midx.c b/midx.c index 24e1e721754d0c..dbb2aa68baa9a7 100644 --- a/midx.c +++ b/midx.c @@ -95,7 +95,7 @@ static int midx_read_object_offsets(const unsigned char *chunk_start, struct multi_pack_index *get_multi_pack_index(struct odb_source *source) { - packfile_store_prepare(source->odb->packfiles); + packfile_store_prepare(source->packfiles); return source->midx; } @@ -447,7 +447,6 @@ static uint32_t midx_for_pack(struct multi_pack_index **_m, int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id) { - struct repository *r = m->source->odb->repo; struct strbuf pack_name = STRBUF_INIT; struct packed_git *p; @@ -460,7 +459,7 @@ int prepare_midx_pack(struct multi_pack_index *m, strbuf_addf(&pack_name, "%s/pack/%s", m->source->path, m->pack_names[pack_int_id]); - p = packfile_store_load_pack(r->objects->packfiles, + p = packfile_store_load_pack(m->source->packfiles, pack_name.buf, m->source->local); strbuf_release(&pack_name); diff --git a/odb.c b/odb.c index 94144a69f532e9..f159fbdd991bd1 100644 --- a/odb.c +++ b/odb.c @@ -155,6 +155,7 @@ static struct odb_source *odb_source_new(struct object_database *odb, source->local = local; source->path = xstrdup(path); source->loose = odb_source_loose_new(source); + source->packfiles = packfile_store_new(source); return source; } @@ -373,6 +374,7 @@ static void odb_source_free(struct odb_source *source) { free(source->path); odb_source_loose_free(source->loose); + packfile_store_free(source->packfiles); free(source); } @@ -704,19 +706,19 @@ static int do_oid_object_info_extended(struct object_database *odb, while (1) { struct odb_source *source; - if (!packfile_store_read_object_info(odb->packfiles, real, oi, flags)) - return 0; - /* Most likely it's a loose object. */ - for (source = odb->sources; source; source = source->next) - if (!odb_source_loose_read_object_info(source, real, oi, flags)) + for (source = odb->sources; source; source = source->next) { + if (!packfile_store_read_object_info(source->packfiles, real, oi, flags) || + !odb_source_loose_read_object_info(source, real, oi, flags)) return 0; + } /* Not a loose object; someone else may have just packed it. */ if (!(flags & OBJECT_INFO_QUICK)) { odb_reprepare(odb->repo->objects); - if (!packfile_store_read_object_info(odb->packfiles, real, oi, flags)) - return 0; + for (source = odb->sources; source; source = source->next) + if (!packfile_store_read_object_info(source->packfiles, real, oi, flags)) + return 0; } /* @@ -975,13 +977,14 @@ int odb_freshen_object(struct object_database *odb, { struct odb_source *source; - if (packfile_store_freshen_object(odb->packfiles, oid)) - return 1; - odb_prepare_alternates(odb); - for (source = odb->sources; source; source = source->next) + for (source = odb->sources; source; source = source->next) { + if (packfile_store_freshen_object(source->packfiles, oid)) + return 1; + if (odb_source_loose_freshen_object(source, oid)) return 1; + } return 0; } @@ -1064,7 +1067,6 @@ struct object_database *odb_new(struct repository *repo, o->sources = odb_source_new(o, primary_source, true); o->sources_tail = &o->sources->next; o->alternate_db = xstrdup_or_null(secondary_sources); - o->packfiles = packfile_store_new(o->sources); free(to_free); @@ -1077,9 +1079,8 @@ void odb_close(struct object_database *o) { struct odb_source *source; - packfile_store_close(o->packfiles); - for (source = o->sources; source; source = source->next) { + packfile_store_close(source->packfiles); if (source->midx) close_midx(source->midx); source->midx = NULL; @@ -1118,7 +1119,6 @@ void odb_free(struct object_database *o) free((char *) o->cached_objects[i].value.buf); free(o->cached_objects); - packfile_store_free(o->packfiles); string_list_clear(&o->submodule_source_paths, 0); chdir_notify_unregister(NULL, odb_update_commondir, o); @@ -1141,13 +1141,13 @@ void odb_reprepare(struct object_database *o) o->loaded_alternates = 0; odb_prepare_alternates(o); - for (source = o->sources; source; source = source->next) + for (source = o->sources; source; source = source->next) { odb_source_loose_reprepare(source); + packfile_store_reprepare(source->packfiles); + } o->approximate_object_count_valid = 0; - packfile_store_reprepare(o->packfiles); - obj_read_unlock(); } diff --git a/odb.h b/odb.h index 014cd9585a2f6e..c97b41c58cd447 100644 --- a/odb.h +++ b/odb.h @@ -51,6 +51,9 @@ struct odb_source { /* Private state for loose objects. */ struct odb_source_loose *loose; + /* Should only be accessed directly by packfile.c and midx.c. */ + struct packfile_store *packfiles; + /* * private data * @@ -128,9 +131,6 @@ struct object_database { struct commit_graph *commit_graph; unsigned commit_graph_attempted : 1; /* if loading has been attempted */ - /* Should only be accessed directly by packfile.c and midx.c. */ - struct packfile_store *packfiles; - /* * This is meant to hold a *small* number of objects that you would * want odb_read_object() to be able to return, but yet you do not want diff --git a/odb/streaming.c b/odb/streaming.c index 745cd486fbb33d..4a4474f891a07f 100644 --- a/odb/streaming.c +++ b/odb/streaming.c @@ -185,13 +185,12 @@ static int istream_source(struct odb_read_stream **out, { struct odb_source *source; - if (!packfile_store_read_object_stream(out, odb->packfiles, oid)) - return 0; - odb_prepare_alternates(odb); - for (source = odb->sources; source; source = source->next) - if (!odb_source_loose_read_object_stream(out, source, oid)) + for (source = odb->sources; source; source = source->next) { + if (!packfile_store_read_object_stream(out, source->packfiles, oid) || + !odb_source_loose_read_object_stream(out, source, oid)) return 0; + } return open_istream_incore(out, odb, oid); } diff --git a/packfile.c b/packfile.c index 37006124650de9..a0225cb2cb5ea2 100644 --- a/packfile.c +++ b/packfile.c @@ -357,12 +357,14 @@ static void scan_windows(struct packed_git *p, static int unuse_one_window(struct object_database *odb) { + struct odb_source *source; struct packfile_list_entry *e; struct packed_git *lru_p = NULL; struct pack_window *lru_w = NULL, *lru_l = NULL; - for (e = odb->packfiles->packs.head; e; e = e->next) - scan_windows(e->pack, &lru_p, &lru_w, &lru_l); + for (source = odb->sources; source; source = source->next) + for (e = source->packfiles->packs.head; e; e = e->next) + scan_windows(e->pack, &lru_p, &lru_w, &lru_l); if (lru_p) { munmap(lru_w->base, lru_w->len); @@ -528,15 +530,18 @@ static void find_lru_pack(struct packed_git *p, struct packed_git **lru_p, struc static int close_one_pack(struct repository *r) { + struct odb_source *source; struct packfile_list_entry *e; struct packed_git *lru_p = NULL; struct pack_window *mru_w = NULL; int accept_windows_inuse = 1; - for (e = r->objects->packfiles->packs.head; e; e = e->next) { - if (e->pack->pack_fd == -1) - continue; - find_lru_pack(e->pack, &lru_p, &mru_w, &accept_windows_inuse); + for (source = r->objects->sources; source; source = source->next) { + for (e = source->packfiles->packs.head; e; e = e->next) { + if (e->pack->pack_fd == -1) + continue; + find_lru_pack(e->pack, &lru_p, &mru_w, &accept_windows_inuse); + } } if (lru_p) @@ -987,7 +992,7 @@ static void prepare_pack(const char *full_name, size_t full_name_len, if (strip_suffix_mem(full_name, &base_len, ".idx") && !(data->source->midx && midx_contains_pack(data->source->midx, file_name))) { char *trimmed_path = xstrndup(full_name, full_name_len); - packfile_store_load_pack(data->source->odb->packfiles, + packfile_store_load_pack(data->source->packfiles, trimmed_path, data->source->local); free(trimmed_path); } @@ -1245,11 +1250,15 @@ void mark_bad_packed_object(struct packed_git *p, const struct object_id *oid) const struct packed_git *has_packed_and_bad(struct repository *r, const struct object_id *oid) { - struct packfile_list_entry *e; + struct odb_source *source; + + for (source = r->objects->sources; source; source = source->next) { + struct packfile_list_entry *e; + for (e = source->packfiles->packs.head; e; e = e->next) + if (oidset_contains(&e->pack->bad_objects, oid)) + return e->pack; + } - for (e = r->objects->packfiles->packs.head; e; e = e->next) - if (oidset_contains(&e->pack->bad_objects, oid)) - return e->pack; return NULL; } @@ -2089,26 +2098,32 @@ static int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e) { - struct packfile_list_entry *l; + struct odb_source *source; - packfile_store_prepare(r->objects->packfiles); + /* + * Note: `packfile_store_prepare()` prepares stores from all sources. + * This will be fixed in a subsequent commit. + */ + packfile_store_prepare(r->objects->sources->packfiles); - for (struct odb_source *source = r->objects->sources; source; source = source->next) + for (source = r->objects->sources; source; source = source->next) if (source->midx && fill_midx_entry(source->midx, oid, e)) return 1; - if (!r->objects->packfiles->packs.head) - return 0; + for (source = r->objects->sources; source; source = source->next) { + struct packfile_list_entry *l; - for (l = r->objects->packfiles->packs.head; l; l = l->next) { - struct packed_git *p = l->pack; + for (l = source->packfiles->packs.head; l; l = l->next) { + struct packed_git *p = l->pack; - if (!p->multi_pack_index && fill_pack_entry(oid, e, p)) { - if (!r->objects->packfiles->skip_mru_updates) - packfile_list_prepend(&r->objects->packfiles->packs, p); - return 1; + if (!p->multi_pack_index && fill_pack_entry(oid, e, p)) { + if (!source->packfiles->skip_mru_updates) + packfile_list_prepend(&source->packfiles->packs, p); + return 1; + } } } + return 0; } @@ -2216,12 +2231,18 @@ int find_kept_pack_entry(struct repository *r, unsigned flags, struct pack_entry *e) { - struct packed_git **cache = packfile_store_get_kept_pack_cache(r->objects->packfiles, flags); + struct odb_source *source; - for (; *cache; cache++) { - struct packed_git *p = *cache; - if (fill_pack_entry(oid, e, p)) - return 1; + for (source = r->objects->sources; source; source = source->next) { + struct packed_git **cache; + + cache = packfile_store_get_kept_pack_cache(source->packfiles, flags); + + for (; *cache; cache++) { + struct packed_git *p = *cache; + if (fill_pack_entry(oid, e, p)) + return 1; + } } return 0; @@ -2287,32 +2308,46 @@ int for_each_object_in_pack(struct packed_git *p, int for_each_packed_object(struct repository *repo, each_packed_object_fn cb, void *data, enum for_each_object_flags flags) { - struct packed_git *p; + struct odb_source *source; int r = 0; int pack_errors = 0; - repo->objects->packfiles->skip_mru_updates = true; - repo_for_each_pack(repo, p) { - if ((flags & FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local) - continue; - if ((flags & FOR_EACH_OBJECT_PROMISOR_ONLY) && - !p->pack_promisor) - continue; - if ((flags & FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS) && - p->pack_keep_in_core) - continue; - if ((flags & FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS) && - p->pack_keep) - continue; - if (open_pack_index(p)) { - pack_errors = 1; - continue; + odb_prepare_alternates(repo->objects); + + for (source = repo->objects->sources; source; source = source->next) { + struct packfile_list_entry *e; + + source->packfiles->skip_mru_updates = true; + + for (e = packfile_store_get_packs(source->packfiles); e; e = e->next) { + struct packed_git *p = e->pack; + + if ((flags & FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local) + continue; + if ((flags & FOR_EACH_OBJECT_PROMISOR_ONLY) && + !p->pack_promisor) + continue; + if ((flags & FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS) && + p->pack_keep_in_core) + continue; + if ((flags & FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS) && + p->pack_keep) + continue; + if (open_pack_index(p)) { + pack_errors = 1; + continue; + } + + r = for_each_object_in_pack(p, cb, data, flags); + if (r) + break; } - r = for_each_object_in_pack(p, cb, data, flags); + + source->packfiles->skip_mru_updates = false; + if (r) break; } - repo->objects->packfiles->skip_mru_updates = false; return r ? r : pack_errors; } diff --git a/packfile.h b/packfile.h index 410f85f03d7710..07f7cdbad13491 100644 --- a/packfile.h +++ b/packfile.h @@ -5,6 +5,7 @@ #include "object.h" #include "odb.h" #include "oidset.h" +#include "repository.h" #include "strmap.h" /* in odb.h */ @@ -170,14 +171,65 @@ void packfile_store_reprepare(struct packfile_store *store); void packfile_store_add_pack(struct packfile_store *store, struct packed_git *pack); +/* + * Get all packs managed by the given store, including packfiles that are + * referenced by multi-pack indices. + */ +struct packfile_list_entry *packfile_store_get_packs(struct packfile_store *store); + +struct repo_for_each_pack_data { + struct odb_source *source; + struct packfile_list_entry *entry; +}; + +static inline struct repo_for_each_pack_data repo_for_eack_pack_data_init(struct repository *repo) +{ + struct repo_for_each_pack_data data = { 0 }; + + odb_prepare_alternates(repo->objects); + + for (struct odb_source *source = repo->objects->sources; source; source = source->next) { + struct packfile_list_entry *entry = packfile_store_get_packs(source->packfiles); + if (!entry) + continue; + data.source = source; + data.entry = entry; + break; + } + + return data; +} + +static inline void repo_for_each_pack_data_next(struct repo_for_each_pack_data *data) +{ + struct odb_source *source; + + data->entry = data->entry->next; + if (data->entry) + return; + + for (source = data->source->next; source; source = source->next) { + struct packfile_list_entry *entry = packfile_store_get_packs(source->packfiles); + if (!entry) + continue; + data->source = source; + data->entry = entry; + return; + } + + data->source = NULL; + data->entry = NULL; +} + /* * Load and iterate through all packs of the given repository. This helper * function will yield packfiles from all object sources connected to the * repository. */ #define repo_for_each_pack(repo, p) \ - for (struct packfile_list_entry *e = packfile_store_get_packs(repo->objects->packfiles); \ - ((p) = (e ? e->pack : NULL)); e = e->next) + for (struct repo_for_each_pack_data eack_pack_data = repo_for_eack_pack_data_init(repo); \ + ((p) = (eack_pack_data.entry ? eack_pack_data.entry->pack : NULL)); \ + repo_for_each_pack_data_next(&eack_pack_data)) int packfile_store_read_object_stream(struct odb_read_stream **out, struct packfile_store *store, @@ -194,12 +246,6 @@ int packfile_store_read_object_info(struct packfile_store *store, struct object_info *oi, unsigned flags); -/* - * Get all packs managed by the given store, including packfiles that are - * referenced by multi-pack indices. - */ -struct packfile_list_entry *packfile_store_get_packs(struct packfile_store *store); - /* * Open the packfile and add it to the store if it isn't yet known. Returns * either the newly opened packfile or the preexisting packfile. Returns a From 7b330a11de903f11a73084d41dd00bbef71cd622 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Fri, 9 Jan 2026 09:33:14 +0100 Subject: [PATCH 10/50] packfile: only prepare owning store in `packfile_store_get_packs()` When calling `packfile_store_get_packs()` we prepare not only the provided packfile store, but also all those of all other sources part of the same object database. This was required when the store was still sitting on the object database level. But now that it sits on the source level it's not anymore. Adapt the code so that we only prepare the MIDX of the provided store. All callers only work in the context of a single store or call the function in a loop over all sources, so this change shouldn't have any practical effects. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- packfile.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/packfile.c b/packfile.c index a0225cb2cb5ea2..c46d53b75dbea7 100644 --- a/packfile.c +++ b/packfile.c @@ -1092,10 +1092,8 @@ struct packfile_list_entry *packfile_store_get_packs(struct packfile_store *stor { packfile_store_prepare(store); - for (struct odb_source *source = store->source->odb->sources; source; source = source->next) { - struct multi_pack_index *m = source->midx; - if (!m) - continue; + if (store->source->midx) { + struct multi_pack_index *m = store->source->midx; for (uint32_t i = 0; i < m->num_packs + m->num_packs_in_base; i++) prepare_midx_pack(m, i); } From 8384cbcb4c737c6d1c6becb40e439c398e3624b4 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Fri, 9 Jan 2026 09:33:15 +0100 Subject: [PATCH 11/50] packfile: only prepare owning store in `packfile_store_prepare()` When calling `packfile_store_prepare()` we prepare not only the provided packfile store, but also all those of all other sources part of the same object database. This was required when the store was still sitting on the object database level. But now that it sits on the source level it's not anymore. Refactor the code so that we only prepare the single packfile store passed by the caller. Adapt callers accordingly. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- builtin/grep.c | 14 ++++++++------ packfile.c | 19 +++++-------------- 2 files changed, 13 insertions(+), 20 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 4855b871ddf368..5b8b87b1ac4d7a 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -1213,12 +1213,14 @@ int cmd_grep(int argc, */ if (recurse_submodules) repo_read_gitmodules(the_repository, 1); - /* - * Note: `packfile_store_prepare()` prepares stores from all - * sources. This will be fixed in a subsequent commit. - */ - if (startup_info->have_repository) - packfile_store_prepare(the_repository->objects->sources->packfiles); + + if (startup_info->have_repository) { + struct odb_source *source; + + odb_prepare_alternates(the_repository->objects); + for (source = the_repository->objects->sources; source; source = source->next) + packfile_store_prepare(source->packfiles); + } start_threads(&opt); } else { diff --git a/packfile.c b/packfile.c index c46d53b75dbea7..23d8f7cb938cca 100644 --- a/packfile.c +++ b/packfile.c @@ -1063,16 +1063,11 @@ static int sort_pack(const struct packfile_list_entry *a, void packfile_store_prepare(struct packfile_store *store) { - struct odb_source *source; - if (store->initialized) return; - odb_prepare_alternates(store->source->odb); - for (source = store->source->odb->sources; source; source = source->next) { - prepare_multi_pack_index_one(source); - prepare_packed_git_one(source); - } + prepare_multi_pack_index_one(store->source); + prepare_packed_git_one(store->source); sort_packs(&store->packs.head, sort_pack); for (struct packfile_list_entry *e = store->packs.head; e; e = e->next) @@ -2098,15 +2093,11 @@ static int find_pack_entry(struct repository *r, { struct odb_source *source; - /* - * Note: `packfile_store_prepare()` prepares stores from all sources. - * This will be fixed in a subsequent commit. - */ - packfile_store_prepare(r->objects->sources->packfiles); - - for (source = r->objects->sources; source; source = source->next) + for (source = r->objects->sources; source; source = source->next) { + packfile_store_prepare(r->objects->sources->packfiles); if (source->midx && fill_midx_entry(source->midx, oid, e)) return 1; + } for (source = r->objects->sources; source; source = source->next) { struct packfile_list_entry *l; From 6acefa0d2ca0fd95461b19026917d09210032b3d Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Fri, 9 Jan 2026 09:33:16 +0100 Subject: [PATCH 12/50] packfile: inline `find_kept_pack_entry()` The `find_kept_pack_entry()` function is only used in `has_object_kept_pack()`, which is only a trivial wrapper itself. Inline the latter into the former. Furthermore, reorder the code so that we can drop the declaration of the function in "packfile.h". This allows us to make the function file-local. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- packfile.c | 28 ++++++++++------------------ packfile.h | 6 ------ 2 files changed, 10 insertions(+), 24 deletions(-) diff --git a/packfile.c b/packfile.c index 23d8f7cb938cca..3bce1b150d5b94 100644 --- a/packfile.c +++ b/packfile.c @@ -2215,12 +2215,17 @@ struct packed_git **packfile_store_get_kept_pack_cache(struct packfile_store *st return store->kept_cache.packs; } -int find_kept_pack_entry(struct repository *r, - const struct object_id *oid, - unsigned flags, - struct pack_entry *e) +int has_object_pack(struct repository *r, const struct object_id *oid) +{ + struct pack_entry e; + return find_pack_entry(r, oid, &e); +} + +int has_object_kept_pack(struct repository *r, const struct object_id *oid, + unsigned flags) { struct odb_source *source; + struct pack_entry e; for (source = r->objects->sources; source; source = source->next) { struct packed_git **cache; @@ -2229,7 +2234,7 @@ int find_kept_pack_entry(struct repository *r, for (; *cache; cache++) { struct packed_git *p = *cache; - if (fill_pack_entry(oid, e, p)) + if (fill_pack_entry(oid, &e, p)) return 1; } } @@ -2237,19 +2242,6 @@ int find_kept_pack_entry(struct repository *r, return 0; } -int has_object_pack(struct repository *r, const struct object_id *oid) -{ - struct pack_entry e; - return find_pack_entry(r, oid, &e); -} - -int has_object_kept_pack(struct repository *r, const struct object_id *oid, - unsigned flags) -{ - struct pack_entry e; - return find_kept_pack_entry(r, oid, flags, &e); -} - int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data, enum for_each_object_flags flags) diff --git a/packfile.h b/packfile.h index 07f7cdbad13491..08a666d538213e 100644 --- a/packfile.h +++ b/packfile.h @@ -445,12 +445,6 @@ int packed_object_info(struct repository *r, void mark_bad_packed_object(struct packed_git *, const struct object_id *); const struct packed_git *has_packed_and_bad(struct repository *, const struct object_id *); -/* - * Iff a pack file in the given repository contains the object named by sha1, - * return true and store its location to e. - */ -int find_kept_pack_entry(struct repository *r, const struct object_id *oid, unsigned flags, struct pack_entry *e); - int has_object_pack(struct repository *r, const struct object_id *oid); int has_object_kept_pack(struct repository *r, const struct object_id *oid, unsigned flags); From a593373b097322adc74aa5f9614c7650f87ebed9 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Fri, 9 Jan 2026 09:33:17 +0100 Subject: [PATCH 13/50] packfile: refactor `find_pack_entry()` to work on the packfile store The function `find_pack_entry()` doesn't work on a specific packfile store, but instead works on the whole repository. This causes a bit of a conceptual mismatch in its callers: - `packfile_store_freshen_object()` supposedly acts on a store, and its callers know to iterate through all sources already. - `packfile_store_read_object_info()` behaves likewise. The only exception that doesn't know to handle iteration through sources is `has_object_pack()`, but that function is trivial to adapt. Refactor the code so that `find_pack_entry()` works on the packfile store level instead. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- packfile.c | 43 +++++++++++++++++++++++-------------------- 1 file changed, 23 insertions(+), 20 deletions(-) diff --git a/packfile.c b/packfile.c index 3bce1b150d5b94..0e4c63e11d1d4d 100644 --- a/packfile.c +++ b/packfile.c @@ -2087,29 +2087,23 @@ static int fill_pack_entry(const struct object_id *oid, return 1; } -static int find_pack_entry(struct repository *r, +static int find_pack_entry(struct packfile_store *store, const struct object_id *oid, struct pack_entry *e) { - struct odb_source *source; - - for (source = r->objects->sources; source; source = source->next) { - packfile_store_prepare(r->objects->sources->packfiles); - if (source->midx && fill_midx_entry(source->midx, oid, e)) - return 1; - } + struct packfile_list_entry *l; - for (source = r->objects->sources; source; source = source->next) { - struct packfile_list_entry *l; + packfile_store_prepare(store); + if (store->source->midx && fill_midx_entry(store->source->midx, oid, e)) + return 1; - for (l = source->packfiles->packs.head; l; l = l->next) { - struct packed_git *p = l->pack; + for (l = store->packs.head; l; l = l->next) { + struct packed_git *p = l->pack; - if (!p->multi_pack_index && fill_pack_entry(oid, e, p)) { - if (!source->packfiles->skip_mru_updates) - packfile_list_prepend(&source->packfiles->packs, p); - return 1; - } + if (!p->multi_pack_index && fill_pack_entry(oid, e, p)) { + if (!store->skip_mru_updates) + packfile_list_prepend(&store->packs, p); + return 1; } } @@ -2120,7 +2114,7 @@ int packfile_store_freshen_object(struct packfile_store *store, const struct object_id *oid) { struct pack_entry e; - if (!find_pack_entry(store->source->odb->repo, oid, &e)) + if (!find_pack_entry(store, oid, &e)) return 0; if (e.p->is_cruft) return 0; @@ -2141,7 +2135,7 @@ int packfile_store_read_object_info(struct packfile_store *store, struct pack_entry e; int rtype; - if (!find_pack_entry(store->source->odb->repo, oid, &e)) + if (!find_pack_entry(store, oid, &e)) return 1; /* @@ -2217,8 +2211,17 @@ struct packed_git **packfile_store_get_kept_pack_cache(struct packfile_store *st int has_object_pack(struct repository *r, const struct object_id *oid) { + struct odb_source *source; struct pack_entry e; - return find_pack_entry(r, oid, &e); + + odb_prepare_alternates(r->objects); + for (source = r->objects->sources; source; source = source->next) { + int ret = find_pack_entry(source->packfiles, oid, &e); + if (ret) + return ret; + } + + return 0; } int has_object_kept_pack(struct repository *r, const struct object_id *oid, From a282a8f163fa70f9eacc880e6188141cef917058 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Fri, 9 Jan 2026 09:33:18 +0100 Subject: [PATCH 14/50] packfile: move MIDX into packfile store The multi-pack index still is tracked as a member of the object database source, but ultimately the MIDX is always tied to one specific packfile store. Move the structure into `struct packfile_store` accordingly. This ensures that the packfile store now keeps track of all data related to packfiles. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- midx.c | 14 +++++++------- odb.c | 8 +------- odb.h | 7 ------- packfile.c | 12 ++++++++---- packfile.h | 3 +++ 5 files changed, 19 insertions(+), 25 deletions(-) diff --git a/midx.c b/midx.c index dbb2aa68baa9a7..fa7a7e5d13f90e 100644 --- a/midx.c +++ b/midx.c @@ -96,7 +96,7 @@ static int midx_read_object_offsets(const unsigned char *chunk_start, struct multi_pack_index *get_multi_pack_index(struct odb_source *source) { packfile_store_prepare(source->packfiles); - return source->midx; + return source->packfiles->midx; } static struct multi_pack_index *load_multi_pack_index_one(struct odb_source *source, @@ -709,12 +709,12 @@ int prepare_multi_pack_index_one(struct odb_source *source) if (!r->settings.core_multi_pack_index) return 0; - if (source->midx) + if (source->packfiles->midx) return 1; - source->midx = load_multi_pack_index(source); + source->packfiles->midx = load_multi_pack_index(source); - return !!source->midx; + return !!source->packfiles->midx; } int midx_checksum_valid(struct multi_pack_index *m) @@ -803,9 +803,9 @@ void clear_midx_file(struct repository *r) struct odb_source *source; for (source = r->objects->sources; source; source = source->next) { - if (source->midx) - close_midx(source->midx); - source->midx = NULL; + if (source->packfiles->midx) + close_midx(source->packfiles->midx); + source->packfiles->midx = NULL; } } diff --git a/odb.c b/odb.c index f159fbdd991bd1..902251f9edb72a 100644 --- a/odb.c +++ b/odb.c @@ -1078,14 +1078,8 @@ struct object_database *odb_new(struct repository *repo, void odb_close(struct object_database *o) { struct odb_source *source; - - for (source = o->sources; source; source = source->next) { + for (source = o->sources; source; source = source->next) packfile_store_close(source->packfiles); - if (source->midx) - close_midx(source->midx); - source->midx = NULL; - } - close_commit_graph(o); } diff --git a/odb.h b/odb.h index c97b41c58cd447..300c3c0c46cb22 100644 --- a/odb.h +++ b/odb.h @@ -54,13 +54,6 @@ struct odb_source { /* Should only be accessed directly by packfile.c and midx.c. */ struct packfile_store *packfiles; - /* - * private data - * - * should only be accessed directly by packfile.c and midx.c - */ - struct multi_pack_index *midx; - /* * Figure out whether this is the local source of the owning * repository, which would typically be its ".git/objects" directory. diff --git a/packfile.c b/packfile.c index 0e4c63e11d1d4d..097dd8d85d374e 100644 --- a/packfile.c +++ b/packfile.c @@ -990,7 +990,8 @@ static void prepare_pack(const char *full_name, size_t full_name_len, size_t base_len = full_name_len; if (strip_suffix_mem(full_name, &base_len, ".idx") && - !(data->source->midx && midx_contains_pack(data->source->midx, file_name))) { + !(data->source->packfiles->midx && + midx_contains_pack(data->source->packfiles->midx, file_name))) { char *trimmed_path = xstrndup(full_name, full_name_len); packfile_store_load_pack(data->source->packfiles, trimmed_path, data->source->local); @@ -1087,8 +1088,8 @@ struct packfile_list_entry *packfile_store_get_packs(struct packfile_store *stor { packfile_store_prepare(store); - if (store->source->midx) { - struct multi_pack_index *m = store->source->midx; + if (store->midx) { + struct multi_pack_index *m = store->midx; for (uint32_t i = 0; i < m->num_packs + m->num_packs_in_base; i++) prepare_midx_pack(m, i); } @@ -2094,7 +2095,7 @@ static int find_pack_entry(struct packfile_store *store, struct packfile_list_entry *l; packfile_store_prepare(store); - if (store->source->midx && fill_midx_entry(store->source->midx, oid, e)) + if (store->midx && fill_midx_entry(store->midx, oid, e)) return 1; for (l = store->packs.head; l; l = l->next) { @@ -2454,6 +2455,9 @@ void packfile_store_close(struct packfile_store *store) BUG("want to close pack marked 'do-not-close'"); close_pack(e->pack); } + if (store->midx) + close_midx(store->midx); + store->midx = NULL; } struct odb_packed_read_stream { diff --git a/packfile.h b/packfile.h index 08a666d538213e..92baf8ee88ef88 100644 --- a/packfile.h +++ b/packfile.h @@ -101,6 +101,9 @@ struct packfile_store { unsigned flags; } kept_cache; + /* The multi-pack index that belongs to this specific packfile store. */ + struct multi_pack_index *midx; + /* * A map of packfile names to packed_git structs for tracking which * packs have been loaded already. From f6b262581a885a11e3e817bf635303e40b640f2a Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Fri, 9 Jan 2026 17:49:13 +0000 Subject: [PATCH 15/50] fsck: snapshot default refs before object walk Fsck has a race when operating on live repositories; consider the following simple script that writes new commits as fsck runs: #!/bin/bash git fsck & PID=$! while ps -p $PID >/dev/null; do sleep 3 git commit -q --allow-empty -m "Another commit" done Since fsck walks objects for connectivity and then reads the refs at the end to check, this can cause fsck to get confused and think that the new refs refer to missing commits and that new reflog entries are invalid. Running the above script in a clone of git.git results in the following (output ellipsized to remove additional errors of the same type): $ ./fsck-while-writing.sh Checking ref database: 100% (1/1), done. Checking object directories: 100% (256/256), done. warning in tag d6602ec5194c87b0fc87103ca4d67251c76f233a: missingTaggerEntry: invalid format - expected 'tagger' line Checking objects: 100% (835091/835091), done. error: HEAD: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310 error: HEAD: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310 error: HEAD: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68 error: HEAD: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68 [...] error: HEAD: invalid reflog entry 87c8a5c2f6b79d9afa9e941590b9a097b6f7ac09 error: HEAD: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a error: HEAD: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a error: HEAD: invalid reflog entry 6724f2dfede88bfa9445a333e06e78536c0c6c0d error: refs/heads/mybranch invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310 error: refs/heads/mybranch: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310 error: refs/heads/mybranch: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68 error: refs/heads/mybranch: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68 [...] error: refs/heads/mybranch: invalid reflog entry 87c8a5c2f6b79d9afa9e941590b9a097b6f7ac09 error: refs/heads/mybranch: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a error: refs/heads/mybranch: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a error: refs/heads/mybranch: invalid reflog entry 6724f2dfede88bfa9445a333e06e78536c0c6c0d Checking connectivity: 833846, done. missing commit 6724f2dfede88bfa9445a333e06e78536c0c6c0d Verifying commits in commit graph: 100% (242243/242243), done. We can minimize the race opportunities by taking a snapshot of refs at program invocation, doing the connectivity check, and then checking the snapshotted refs afterward. This avoids races with regular refs between fsck and adding objects to the database, though it still leaves a race between a gc and fsck. We are less concerned about folks simultaneously running gc with fsck; though, if it becomes an issue, we could lock fsck during gc. We definitely do not want to lock fsck during operations that may add objects to the object store; that would be problematic for forges. Note that refs aren't the only problem, though; reflog entries and index entries could be problematic as well. For now we punt on index entries just leaving a TODO comment, and for reflogs we use a coarse solution of taking the time at the beginning of the program and ignoring reflog entries newer than that time. That may be imperfect if dealing with a network filesystem, so we leave TODO comment for those that want to improve that handling as well. As a high level overview: * In addition to fsck_handle_ref(), which now is only a few lines long to process a ref, there's also a snapshot_ref() which is called early in the program for each ref and takes all the error checking logic. * The iterating over refs that used to be in get_default_heads() plus a loop over the arguments now appears in shapshot_refs(). * There's a new process_refs() as well that kind of looks like the old get_default_heads() though it is streamlined due to the work done by snapshot_refs(). This combination of changes modifies the output of running the script (from the beginning of this commit message) to: $ ./fsck-while-writing.sh Checking ref database: 100% (1/1), done. Checking object directories: 100% (256/256), done. warning in tag d6602ec5194c87b0fc87103ca4d67251c76f233a: missingTaggerEntry: invalid format - expected 'tagger' line Checking objects: 100% (835091/835091), done. Checking connectivity: 833846, done. Verifying commits in commit graph: 100% (242243/242243), done. While worries about live updates while running fsck is likely of most interest for forge operators, it may also benefit those with automated jobs (such as git maintenance) or even casual users who want to do other work in their clone while fsck is running. Helped-by: Junio C Hamano Helped-by: Jeff King Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano --- builtin/fsck.c | 162 +++++++++++++++++++++++++++++++++++++------------ 1 file changed, 122 insertions(+), 40 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 4979bc795e5d61..f671288026b4ef 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -51,6 +51,7 @@ static int show_progress = -1; static int show_dangling = 1; static int name_objects; static int check_references = 1; +static timestamp_t now; #define ERROR_OBJECT 01 #define ERROR_REACHABLE 02 #define ERROR_PACK 04 @@ -510,6 +511,9 @@ static int fsck_handle_reflog_ent(const char *refname, timestamp_t timestamp, int tz UNUSED, const char *message UNUSED, void *cb_data UNUSED) { + if (now && timestamp > now) + return 0; + if (verbose) fprintf_ln(stderr, _("Checking reflog %s->%s"), oid_to_hex(ooid), oid_to_hex(noid)); @@ -531,8 +535,22 @@ static int fsck_handle_reflog(const char *logname, void *cb_data) return 0; } -static int fsck_handle_ref(const struct reference *ref, void *cb_data UNUSED) +struct ref_snapshot { + char *refname; + struct object_id oid; + /* TODO: Maybe supplement with latest reflog entry info too? */ +}; + +struct snapshot { + size_t nr; + size_t alloc; + struct ref_snapshot *ref; + /* TODO: Consider also snapshotting the index of each worktree. */ +}; + +static int snapshot_ref(const struct reference *ref, void *cb_data) { + struct snapshot *snap = cb_data; struct object *obj; obj = parse_object(the_repository, ref->oid); @@ -556,6 +574,20 @@ static int fsck_handle_ref(const struct reference *ref, void *cb_data UNUSED) errors_found |= ERROR_REFS; } default_refs++; + + ALLOC_GROW(snap->ref, snap->nr + 1, snap->alloc); + snap->ref[snap->nr].refname = xstrdup(ref->name); + oidcpy(&snap->ref[snap->nr].oid, ref->oid); + snap->nr++; + + return 0; +} + +static int fsck_handle_ref(const struct reference *ref, void *cb_data UNUSED) +{ + struct object *obj; + + obj = parse_object(the_repository, ref->oid); obj->flags |= USED; fsck_put_object_name(&fsck_walk_options, ref->oid, "%s", ref->name); @@ -568,14 +600,35 @@ static int fsck_head_link(const char *head_ref_name, const char **head_points_at, struct object_id *head_oid); -static void get_default_heads(void) +static void snapshot_refs(struct snapshot *snap, int argc, const char **argv) { struct worktree **worktrees, **p; const char *head_points_at; struct object_id head_oid; + for (int i = 0; i < argc; i++) { + const char *arg = argv[i]; + struct object_id oid; + if (!repo_get_oid(the_repository, arg, &oid)) { + struct reference ref = { + .name = arg, + .oid = &oid, + }; + + snapshot_ref(&ref, snap); + continue; + } + error(_("invalid parameter: expected sha1, got '%s'"), arg); + errors_found |= ERROR_OBJECT; + } + + if (argc) { + include_reflogs = 0; + return; + } + refs_for_each_rawref(get_main_ref_store(the_repository), - fsck_handle_ref, NULL); + snapshot_ref, snap); worktrees = get_worktrees(); for (p = worktrees; *p; p++) { @@ -590,15 +643,52 @@ static void get_default_heads(void) .oid = &head_oid, }; - fsck_handle_ref(&ref, NULL); + snapshot_ref(&ref, snap); } strbuf_release(&refname); - if (include_reflogs) + /* + * TODO: Could use refs_for_each_reflog(...) to find + * latest entry instead of using a global 'now' for that + * purpose. + */ + } + free_worktrees(worktrees); + + /* Ignore reflogs newer than now */ + now = time(NULL); +} + + +static void free_snapshot_refs(struct snapshot *snap) +{ + for (size_t i = 0; i < snap->nr; i++) + free(snap->ref[i].refname); + free(snap->ref); +} + +static void process_refs(struct snapshot *snap) +{ + struct worktree **worktrees, **p; + + for (size_t i = 0; i < snap->nr; i++) { + struct reference ref = { + .name = snap->ref[i].refname, + .oid = &snap->ref[i].oid, + }; + fsck_handle_ref(&ref, NULL); + } + + if (include_reflogs) { + worktrees = get_worktrees(); + for (p = worktrees; *p; p++) { + struct worktree *wt = *p; + refs_for_each_reflog(get_worktree_ref_store(wt), fsck_handle_reflog, wt); + } + free_worktrees(worktrees); } - free_worktrees(worktrees); /* * Not having any default heads isn't really fatal, but @@ -963,8 +1053,12 @@ int cmd_fsck(int argc, const char *prefix, struct repository *repo UNUSED) { - int i; struct odb_source *source; + struct snapshot snap = { + .nr = 0, + .alloc = 0, + .ref = NULL + }; /* fsck knows how to handle missing promisor objects */ fetch_if_missing = 0; @@ -1000,6 +1094,17 @@ int cmd_fsck(int argc, if (check_references) fsck_refs(the_repository); + /* + * Take a snapshot of the refs before walking objects to avoid looking + * at a set of refs that may be changed by the user while we are walking + * objects. We can still walk over new objects that are added during the + * execution of fsck but won't miss any objects that were reachable. + */ + snapshot_refs(&snap, argc, argv); + + /* Ensure we get a "fresh" view of the odb */ + odb_reprepare(the_repository->objects); + if (connectivity_only) { for_each_loose_object(the_repository->objects, mark_loose_for_connectivity, NULL, 0); @@ -1041,42 +1146,18 @@ int cmd_fsck(int argc, errors_found |= ERROR_OBJECT; } - for (i = 0; i < argc; i++) { - const char *arg = argv[i]; - struct object_id oid; - if (!repo_get_oid(the_repository, arg, &oid)) { - struct object *obj = lookup_object(the_repository, - &oid); - - if (!obj || !(obj->flags & HAS_OBJ)) { - if (is_promisor_object(the_repository, &oid)) - continue; - error(_("%s: object missing"), oid_to_hex(&oid)); - errors_found |= ERROR_OBJECT; - continue; - } - - obj->flags |= USED; - fsck_put_object_name(&fsck_walk_options, &oid, - "%s", arg); - mark_object_reachable(obj); - continue; - } - error(_("invalid parameter: expected sha1, got '%s'"), arg); - errors_found |= ERROR_OBJECT; - } + /* Process the snapshotted refs and the reflogs. */ + process_refs(&snap); - /* - * If we've not been given any explicit head information, do the - * default ones from .git/refs. We also consider the index file - * in this case (ie this implies --cache). - */ - if (!argc) { - get_default_heads(); + /* If not given any explicit objects, process index files too. */ + if (!argc) keep_cache_objects = 1; - } - if (keep_cache_objects) { + /* + * TODO: Consider first walking these indexes in snapshot_refs, + * to snapshot where the index entries used to point, and then + * check those snapshotted locations here. + */ struct worktree **worktrees, **p; verify_index_checksum = 1; @@ -1149,5 +1230,6 @@ int cmd_fsck(int argc, } } + free_snapshot_refs(&snap); return errors_found; } From f0af8b4aae3397d6bbe68a3c09f558cab983eaff Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 9 Jan 2026 20:05:05 +0000 Subject: [PATCH 16/50] mingw: do resolve symlinks in `getcwd()` As pointed out in https://github.com/git-for-windows/git/issues/1676, the `git rev-parse --is-inside-work-tree` command currently fails when the current directory's path contains symbolic links. The underlying reason for this bug is that `getcwd()` is supposed to resolve symbolic links, but our `mingw_getcwd()` implementation did not. We do have all the building blocks for that, though: the `GetFinalPathByHandleW()` function will resolve symbolic links. However, we only called that function if `GetLongPathNameW()` failed, for historical reasons: the latter function was supported for a long time, but the former API function was introduced only with Windows Vista, and we used to support also Windows XP. With that support having been dropped, we are free to call the symbolic link-resolving function right away. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- compat/mingw.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/compat/mingw.c b/compat/mingw.c index f09b49ff21ddab..cf4f3c92e7a889 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -1239,18 +1239,16 @@ char *mingw_getcwd(char *pointer, int len) { wchar_t cwd[MAX_PATH], wpointer[MAX_PATH]; DWORD ret = GetCurrentDirectoryW(ARRAY_SIZE(cwd), cwd); + HANDLE hnd; if (!ret || ret >= ARRAY_SIZE(cwd)) { errno = ret ? ENAMETOOLONG : err_win_to_posix(GetLastError()); return NULL; } - ret = GetLongPathNameW(cwd, wpointer, ARRAY_SIZE(wpointer)); - if (!ret && GetLastError() == ERROR_ACCESS_DENIED) { - HANDLE hnd = CreateFileW(cwd, 0, - FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, - OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, NULL); - if (hnd == INVALID_HANDLE_VALUE) - return NULL; + hnd = CreateFileW(cwd, 0, + FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, + OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, NULL); + if (hnd != INVALID_HANDLE_VALUE) { ret = GetFinalPathNameByHandleW(hnd, wpointer, ARRAY_SIZE(wpointer), 0); CloseHandle(hnd); if (!ret || ret >= ARRAY_SIZE(wpointer)) @@ -1259,13 +1257,11 @@ char *mingw_getcwd(char *pointer, int len) return NULL; return pointer; } - if (!ret || ret >= ARRAY_SIZE(wpointer)) - return NULL; - if (GetFileAttributesW(wpointer) == INVALID_FILE_ATTRIBUTES) { + if (GetFileAttributesW(cwd) == INVALID_FILE_ATTRIBUTES) { errno = ENOENT; return NULL; } - if (xwcstoutf(pointer, wpointer, len) < 0) + if (xwcstoutf(pointer, cwd, len) < 0) return NULL; convert_slashes(pointer); return pointer; From 7997e36561b9f7c084ea37ec280708736ab3dcb4 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 9 Jan 2026 20:05:06 +0000 Subject: [PATCH 17/50] init: do parse _all_ core.* settings early In Git for Windows, `has_symlinks` is set to 0 by default. Therefore, we need to parse the config setting `core.symlinks` to know if it has been set to `true`. In `git init`, we must do that before copying the templates because they might contain symbolic links. Even if the support for symbolic links on Windows has not made it to upstream Git yet, we really should make sure that all the `core.*` settings are parsed before proceeding, as they might very well change the behavior of `git init` in a way the user intended. This fixes https://github.com/git-for-windows/git/issues/3414 Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- environment.c | 4 ++-- environment.h | 2 ++ setup.c | 2 +- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/environment.c b/environment.c index a770b5921d9546..b65b85a01f18cf 100644 --- a/environment.c +++ b/environment.c @@ -324,8 +324,8 @@ static enum fsync_component parse_fsync_components(const char *var, const char * return (current & ~negative) | positive; } -static int git_default_core_config(const char *var, const char *value, - const struct config_context *ctx, void *cb) +int git_default_core_config(const char *var, const char *value, + const struct config_context *ctx, void *cb) { /* This needs a better name */ if (!strcmp(var, "core.filemode")) { diff --git a/environment.h b/environment.h index 51898c99cd1e45..e61f843fdbb637 100644 --- a/environment.h +++ b/environment.h @@ -106,6 +106,8 @@ const char *strip_namespace(const char *namespaced_ref); int git_default_config(const char *, const char *, const struct config_context *, void *); +int git_default_core_config(const char *var, const char *value, + const struct config_context *ctx, void *cb); /* * TODO: All the below state either explicitly or implicitly relies on diff --git a/setup.c b/setup.c index 3a6a048620dd7d..b723f8b33931bd 100644 --- a/setup.c +++ b/setup.c @@ -2693,7 +2693,7 @@ int init_db(const char *git_dir, const char *real_git_dir, * have set up the repository format such that we can evaluate * includeIf conditions correctly in the case of re-initialization. */ - repo_config(the_repository, platform_core_config, NULL); + repo_config(the_repository, git_default_core_config, NULL); safe_create_dir(the_repository, git_dir, 0); From 0fcbb57f970f318df422ca756a269651885576b9 Mon Sep 17 00:00:00 2001 From: Karsten Blees Date: Fri, 9 Jan 2026 20:05:07 +0000 Subject: [PATCH 18/50] strbuf_readlink(): avoid calling `readlink()` twice in corner-cases The `strbuf_readlink()` function calls `readlink()`` twice if the hint argument specifies the exact size of the link target (e.g. by passing stat.st_size as returned by `lstat()`). This is necessary because `readlink(..., hint) == hint` could mean that the buffer was too small. Use `hint + 1` as buffer size to prevent this. Signed-off-by: Karsten Blees Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- strbuf.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/strbuf.c b/strbuf.c index 6c3851a7f84d72..44a8f6a554ee43 100644 --- a/strbuf.c +++ b/strbuf.c @@ -578,12 +578,12 @@ int strbuf_readlink(struct strbuf *sb, const char *path, size_t hint) while (hint < STRBUF_MAXLINK) { ssize_t len; - strbuf_grow(sb, hint); - len = readlink(path, sb->buf, hint); + strbuf_grow(sb, hint + 1); + len = readlink(path, sb->buf, hint + 1); if (len < 0) { if (errno != ERANGE) break; - } else if (len < hint) { + } else if (len <= hint) { strbuf_setlen(sb, len); return 0; } From 21f368daab677724ca1a200c22d65b64b15117b5 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 9 Jan 2026 20:05:08 +0000 Subject: [PATCH 19/50] strbuf_readlink(): support link targets that exceed 2*PATH_MAX The `strbuf_readlink()` function refuses to read link targets that exceed 2*PATH_MAX (even if a sufficient size was specified by the caller). The reason that that limit is 2*PATH_MAX instead of PATH_MAX is that the symlink targets do not need to be normalized. After running `ln -s a/../a/../a/../a/../b c`, the target of the symlink `c` will not be normalized to `b` but instead be much longer. As such, symlink targets' lengths can far exceed PATH_MAX. They are frequently much longer than 2*PATH_MAX on Windows, which actually supports paths up to 32,767 characters, but sets PATH_MAX to 260 for backwards compatibility. For full details, see https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation Let's just hard-code the limit used by `strbuf_readlink()` to 32,767 and make it independent of the current platform's PATH_MAX. Based-on-a-patch-by: Karsten Blees Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- strbuf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/strbuf.c b/strbuf.c index 44a8f6a554ee43..ec2b7afbe62133 100644 --- a/strbuf.c +++ b/strbuf.c @@ -566,7 +566,7 @@ ssize_t strbuf_write(struct strbuf *sb, FILE *f) return sb->len ? fwrite(sb->buf, 1, sb->len, f) : 0; } -#define STRBUF_MAXLINK (2*PATH_MAX) +#define STRBUF_MAXLINK (32767) int strbuf_readlink(struct strbuf *sb, const char *path, size_t hint) { From aa7b8864d841f16044b0d79fce5baaec1830b3e3 Mon Sep 17 00:00:00 2001 From: Karsten Blees Date: Fri, 9 Jan 2026 20:05:09 +0000 Subject: [PATCH 20/50] trim_last_path_component(): avoid hard-coding the directory separator Currently, this function hard-codes the directory separator as the forward slash. However, on Windows the backslash character is valid, too. And we want to call this function in the upcoming support for symlinks on Windows with the symlink targets (which naturally use the canonical directory separator on Windows, which is _not_ the forward slash). Prepare that function to be useful also in that context. Signed-off-by: Karsten Blees Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- lockfile.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lockfile.c b/lockfile.c index 1d5ed016828746..67082a9caaeb18 100644 --- a/lockfile.c +++ b/lockfile.c @@ -19,14 +19,14 @@ static void trim_last_path_component(struct strbuf *path) int i = path->len; /* back up past trailing slashes, if any */ - while (i && path->buf[i - 1] == '/') + while (i && is_dir_sep(path->buf[i - 1])) i--; /* * then go backwards until a slash, or the beginning of the * string */ - while (i && path->buf[i - 1] != '/') + while (i && !is_dir_sep(path->buf[i - 1])) i--; strbuf_setlen(path, i); From 28a7e27cff717e5ef91f7445e6a418068608082d Mon Sep 17 00:00:00 2001 From: Colin Stagner Date: Fri, 9 Jan 2026 19:18:11 -0600 Subject: [PATCH 21/50] contrib/subtree: detect rewritten subtree commits MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit git subtree split --prefix P detects splits that are outside of path prefix `P` and prunes them from history graph processing. This improves the performance of repeated `split --rejoin` with many different prefixes. Both before and after 83f9dad7d6 (contrib/subtree: fix split with squashed subtrees, 2025-09-09), the pruning logic does not detect **rebased** or **cherry-picked** git-subtree commits. If `split` encounters any of these commits, the split output may have incomplete history. All commits authored by git subtree merge [--squash] --prefix Q have a first or second parent that has *only* subtree commits as ancestors. When splitting a completely different path `P/`, it is safe to ignore: 1. the merged tree 2. the subtree parent 3. *all* of that parent's ancestry, which applies only to path `Q/` and not `P/`. But this relationship no longer holds if the git-subtree commit is rebased or otherwise reauthored. After a rebase, the former git-subtree commit will have other unrelated commits as ancestors. Ignoring these commits may exclude the history of `P/`, leading to incomplete `subtree split` output. The pruning logic relies solely on the `git-subtree-*:` trailers to detect git-subtree commits, which it blindly accepts without further validation. The split logic also takes its time about being wrong: `cmd_split()` execs a `git show` for *every* commit in the split range… twice. This is inefficient in a shell script. Add a "reality check" to ignore rebased or rewritten commits: * Rewrites of non-merge commits cannot be detected, so the new detector no longer looks for them. * Merges carry a `git-subtree-mainline:` trailer with the hash of the **first parent**. If this hash differs, or if the "merge" commit no longer has multiple parents, a rewrite has occurred. To increase speed, package this logic in a new method, `find_other_splits()`. Perform the check up-front by iterating over a single `git log`. Add ignored subtrees to: 1. the `notree` cache, which excludes them from the `split` history 2. a `prune` negative refs list. The negative refs prevent recursing into other subtrees. Since there are potentially a *lot* of these, cache them on disk and use rev-list's `--stdin` mode. Reported-by: George Signed-off-by: Colin Stagner Signed-off-by: Junio C Hamano --- contrib/subtree/git-subtree.sh | 141 +++++++++++++++++++---------- contrib/subtree/t/t7900-subtree.sh | 83 +++++++++++++++-- 2 files changed, 169 insertions(+), 55 deletions(-) diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh index 17106d1a721519..3ebe88cbeadb02 100755 --- a/contrib/subtree/git-subtree.sh +++ b/contrib/subtree/git-subtree.sh @@ -325,6 +325,12 @@ check_parents () { done } +# Usage: get_notree REV +get_notree () { + assert test $# = 1 + test -r "$cachedir/notree/$1" +} + # Usage: set_notree REV set_notree () { assert test $# = 1 @@ -511,6 +517,71 @@ find_existing_splits () { done || exit $? } +# Usage: find_other_splits DIR REV UNREVS... +# +# Scan history in REV UNREVS for other `git subtree split --rejoin` +# merge commits belonging to prefixes outside of DIR. These +# "other splits" don't contribute to DIR and can be ignored. +# +# If any such rejoins are found, +# +# * emit their second-parent as an UNREV, avoiding a +# potentially costly history traversal +# +# * mark the merge commit as "notree" to ignore it +find_other_splits () { + assert test $# -ge 2 + dir="${1%/}" + rev="$2" + shift 2 + debug "Looking for other splits with dir != $dir..." + + git log \ + --grep '^git-subtree-mainline:' \ + --no-patch \ + --no-show-signature \ + --format='hash: %H%nparents: %P%n%(trailers:key=git-subtree-dir,key=git-subtree-mainline,key=git-subtree-split)%nEND' \ + "$rev" ${@:+"$@"} | + while read -r key val + do + case "$key" in + hash:) + commit_hash="${val}" + commit_parents= + subtree_dir= + subtree_mainline= + subtree_split= + ;; + parents:) + commit_parents="${val}" ;; + git-subtree-dir:) + subtree_dir="${val%/}/" ;; + git-subtree-mainline:) + subtree_mainline="${val}" ;; + git-subtree-split:) + subtree_split="${val}" ;; + END) + # verify: + # * all git-subtree-* trailers are present + # * this subtree is outside of $dir + # * the first parent is the git-subtree-mainline: + # * the commit has at least two parents + if test -n "${subtree_dir}" && + test -n "${subtree_split}" && + test -n "${subtree_mainline}" && + test "${subtree_dir}" = "${subtree_dir#"${dir}/"}" && + test "${commit_parents}" != "${commit_parents#"$subtree_mainline "}" && + rev_exists "${commit_hash}^2" + then + debug "find_other_splits excluding dir=$subtree_dir merged in ${commit_hash}" + echo "^${commit_hash}^2" + set_notree "${commit_hash}" + fi + ;; + esac + done +} + # Usage: copy_commit REV TREE FLAGS_STR copy_commit () { assert test $# = 3 @@ -785,42 +856,6 @@ ensure_valid_ref_format () { die "fatal: '$1' does not look like a ref" } -# Usage: should_ignore_subtree_split_commit REV -# -# Check if REV is a commit from another subtree and should be -# ignored from processing for splits -should_ignore_subtree_split_commit () { - assert test $# = 1 - - git show \ - --no-patch \ - --no-show-signature \ - --format='%(trailers:key=git-subtree-dir,key=git-subtree-mainline)' \ - "$1" | - ( - have_mainline= - subtree_dir= - - while read -r trailer val - do - case "$trailer" in - git-subtree-dir:) - subtree_dir="${val%/}" ;; - git-subtree-mainline:) - have_mainline=y ;; - esac - done - - if test -n "${subtree_dir}" && - test -z "${have_mainline}" && - test "${subtree_dir}" != "$arg_prefix" - then - return 0 - fi - return 1 - ) -} - # Usage: process_split_commit REV PARENTS process_split_commit () { assert test $# = 2 @@ -994,31 +1029,39 @@ cmd_split () { fi unrevs="$(find_existing_splits "$dir" "$rev" "$repository")" || exit $? + (find_other_splits >"$cachedir/prune" "$dir" "$rev" $unrevs) || exit $? # We can't restrict rev-list to only $dir here, because some of our # parents have the $dir contents the root, and those won't match. # (and rev-list --follow doesn't seem to solve this) - grl='git rev-list --topo-order --reverse --parents $rev $unrevs' - revmax=$(eval "$grl" | wc -l) + revmax="$(git rev-list \ + <"$cachedir/prune" \ + --topo-order \ + --reverse \ + --parents \ + --stdin \ + --count \ + "$rev" \ + $unrevs + )" revcount=0 createcount=0 extracount=0 - eval "$grl" | + git rev-list \ + <"$cachedir/prune" \ + --topo-order \ + --reverse \ + --parents \ + --stdin \ + "$rev" \ + $unrevs | while read rev parents do - if should_ignore_subtree_split_commit "$rev" + if get_notree "$rev" then continue fi - parsedparents='' - for parent in $parents - do - if ! should_ignore_subtree_split_commit "$parent" - then - parsedparents="$parsedparents$parent " - fi - done - process_split_commit "$rev" "$parsedparents" + process_split_commit "$rev" "$parents" done || exit $? latest_new=$(cache_get latest_new) || exit $? diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh index 316dc5269e2b6f..4db3a6eff37c4d 100755 --- a/contrib/subtree/t/t7900-subtree.sh +++ b/contrib/subtree/t/t7900-subtree.sh @@ -411,8 +411,9 @@ test_expect_success 'split sub dir/ with --rejoin' ' git fetch ./"sub proj" HEAD && git subtree merge --prefix="sub dir" FETCH_HEAD && split_hash=$(git subtree split --prefix="sub dir" --annotate="*") && - git subtree split --prefix="sub dir" --annotate="*" --rejoin && - test "$(last_commit_subject)" = "Split '\''sub dir/'\'' into commit '\''$split_hash'\''" + git subtree split --prefix="sub dir" --annotate="*" -b spl --rejoin && + test "$(last_commit_subject)" = "Split '\''sub dir/'\'' into commit '\''$split_hash'\''" && + test "$(git rev-list --count spl)" -eq 5 ) ' @@ -442,18 +443,25 @@ test_expect_success 'split with multiple subtrees' ' git -C "$test_count" subtree add --prefix=subADir FETCH_HEAD && git -C "$test_count" fetch ./subB HEAD && git -C "$test_count" subtree add --prefix=subBDir FETCH_HEAD && + test "$(git -C "$test_count" rev-list --count main)" -eq 7 && test_create_commit "$test_count" subADir/main-subA1 && test_create_commit "$test_count" subBDir/main-subB1 && git -C "$test_count" subtree split --prefix=subADir \ - --squash --rejoin -m "Sub A Split 1" && + --squash --rejoin -m "Sub A Split 1" -b a1 && + test "$(git -C "$test_count" rev-list --count main..a1)" -eq 1 && git -C "$test_count" subtree split --prefix=subBDir \ - --squash --rejoin -m "Sub B Split 1" && + --squash --rejoin -m "Sub B Split 1" -b b1 && + test "$(git -C "$test_count" rev-list --count main..b1)" -eq 1 && test_create_commit "$test_count" subADir/main-subA2 && test_create_commit "$test_count" subBDir/main-subB2 && git -C "$test_count" subtree split --prefix=subADir \ - --squash --rejoin -m "Sub A Split 2" && + --squash --rejoin -m "Sub A Split 2" -b a2 && + test "$(git -C "$test_count" rev-list --count main..a2)" -eq 2 && + test "$(git -C "$test_count" rev-list --count a1..a2)" -eq 1 && test "$(git -C "$test_count" subtree split --prefix=subBDir \ - --squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = "" + --squash --rejoin -d -m "Sub B Split 1" -b b2 2>&1 | grep -w "\[1\]")" = "" && + test "$(git -C "$test_count" rev-list --count main..b2)" -eq 2 && + test "$(git -C "$test_count" rev-list --count b1..b2)" -eq 1 ' # When subtree split-ing a directory that has other subtree @@ -477,6 +485,7 @@ do test_path_is_file subA/file1.t && test_path_is_file subA/subB/file2.t && git subtree split --prefix=subA --branch=bsplit && + test "$(git rev-list --count bsplit)" -eq 2 && git checkout bsplit && test_path_is_file file1.t && test_path_is_file subB/file2.t && @@ -489,6 +498,7 @@ do --prefix=subA/subB mksubtree && test_path_is_file subA/subB/file3.t && git subtree split --prefix=subA --branch=bsplit && + test "$(git rev-list --count bsplit)" -eq 3 && git checkout bsplit && test_path_is_file file1.t && test_path_is_file subB/file2.t && @@ -497,6 +507,67 @@ do ' done +# Usually, +# +# git subtree merge -P subA --squash f00... +# +# makes two commits, in this order: +# +# 1. Squashed 'subA/' content from commit f00... +# 2. Merge commit (1) as 'subA' +# +# Commit 1 updates the subtree but does *not* rewrite paths. +# Commit 2 rewrites all trees to start with `subA/` +# +# Commit 1 either has no parents or depends only on other +# "Squashed 'subA/' content" commits. +# +# For merge without --squash, subtree produces just one commit: +# a merge commit with git-subtree trailers. +# +# In either case, if the user rebases these commits, they will +# still have the git-subtree-* trailers… but will NOT have +# the layout described above. +# +# Test that subsequent `git subtree split` are not confused by this. +test_expect_success 'split with rebased subtree commit' ' + subtree_test_create_repo "$test_count" && + ( + cd "$test_count" && + test_commit file0 && + test_create_subtree_add \ + . mksubtree subA file1 --squash && + test_path_is_file subA/file1.t && + mkdir subB && + test_commit subB/bfile && + git commit --amend -F - <<'EOF' && +Squashed '\''subB/'\'' content from commit '\''badf00da911bbe895347b4b236f5461d55dc9877'\'' + +Simulate a cherry-picked or rebased subtree commit. + +git-subtree-dir: subB +git-subtree-split: badf00da911bbe895347b4b236f5461d55dc9877 +EOF + test_commit subA/file2 && + test_commit subB/bfile2 && + git commit --amend -F - <<'EOF' && +Split '\''subB/'\'' into commit '\''badf00da911bbe895347b4b236f5461d55dc9877'\'' + +Simulate a cherry-picked or rebased subtree commit. + +git-subtree-dir: subB +git-subtree-mainline: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +git-subtree-split: badf00da911bbe895347b4b236f5461d55dc9877 +EOF + git subtree split --prefix=subA --branch=bsplit && + git checkout bsplit && + test_path_is_file file1.t && + test_path_is_file file2.t && + test "$(last_commit_subject)" = "subA/file2" && + test "$(git rev-list --count bsplit)" -eq 2 + ) +' + test_expect_success 'split sub dir/ with --rejoin from scratch' ' subtree_test_create_repo "$test_count" && test_create_commit "$test_count" main1 && From a8227ae8d5410e54c5788283e35b42b7bf5b22ff Mon Sep 17 00:00:00 2001 From: KJ Tsanaktsidis Date: Mon, 12 Jan 2026 01:44:39 +0000 Subject: [PATCH 22/50] http-backend: write newlines to stderr when responding with errors The not_found and forbidden methods currently do not write a newline to stderr after the error message. This means that if git-http-backend is invoked through something like fcgiwrap, and the stderr of that fcgiwrap process is sent to a logging daemon (e.g. journald), the error messages of several git-http-backend invocations will just get strung together, e.g. > Not a git repository: '/var/lib/git/foo.git'Not a git repository: '/var/lib/git/foo.git'Not a git repository: '/var/lib/git/foo.git' I think it's git-http-backend's responsibility to format these messages properly, rather than it being fcgiwrap's job to notice that the script didn't terminate stderr with a newline and do so itself. Signed-off-by: KJ Tsanaktsidis Signed-off-by: Junio C Hamano --- http-backend.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/http-backend.c b/http-backend.c index 52f0483dd309d7..8c810cfcbddc73 100644 --- a/http-backend.c +++ b/http-backend.c @@ -143,8 +143,10 @@ static NORETURN void not_found(struct strbuf *hdr, const char *err, ...) end_headers(hdr); va_start(params, err); - if (err && *err) + if (err && *err) { vfprintf(stderr, err, params); + putc('\n', stderr); + } va_end(params); exit(0); } @@ -159,8 +161,10 @@ static NORETURN void forbidden(struct strbuf *hdr, const char *err, ...) end_headers(hdr); va_start(params, err); - if (err && *err) + if (err && *err) { vfprintf(stderr, err, params); + putc('\n', stderr); + } va_end(params); exit(0); } From 220f888d7e2f3d8370a99992785fc0f005913540 Mon Sep 17 00:00:00 2001 From: Pushkar Singh Date: Sun, 11 Jan 2026 19:07:52 +0000 Subject: [PATCH 23/50] t1410: use test helpers in reflog rewind test Replace raw `test -f` and `! test -f` checks in the rewind test with `test_path_is_file` and `test_path_is_missing`. This provides clearer failure diagnostics and keeps the test consistent with the rest of the test suite. Signed-off-by: Pushkar Singh Signed-off-by: Junio C Hamano --- t/t1410-reflog.sh | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh index e30f87a35812b8..ce71f9a30ae1ee 100755 --- a/t/t1410-reflog.sh +++ b/t/t1410-reflog.sh @@ -130,10 +130,10 @@ test_expect_success 'pass through -- to sub-command' ' test_expect_success rewind ' test_tick && git reset --hard HEAD~2 && - test -f C && - test -f A/B/E && - ! test -f F && - ! test -f A/G && + test_path_is_file C && + test_path_is_file A/B/E && + test_path_is_missing F && + test_path_is_missing A/G && check_have A B C D E F G H I J K L && From 123e8186a72cd71863668b6acb23a2faa30e6968 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:00:41 +0100 Subject: [PATCH 24/50] object-file: always set OI_LOOSE when reading object info There are some early returns in `odb_source_loose_read_object_info()` in cases where we don't have to open the loose object. These return paths do not set `struct object_info::whence` to `OI_LOOSE` though, so it becomes impossible for the caller to tell the format of such an object. The root cause of this really is that we have so many different return paths in the function. As a consequence, it's harder than necessary to make sure that all successful exit paths sot up the `whence` field as expected. Address this by refactoring the function to have a single exit path. Like this, we can trivially set up the `whence` field when we exit successfully from the function. Note that we also: - Rename `status` to `ret` to match our usual coding style, but also to show that the old `status` variable is now always getting the expected value. Furthermore, the value is not initialized anymore, which has the consequence that most compilers will warn for exit paths where we forgot to set it. - Move the setup of scratch pointers closer to `parse_loose_header()` to show where it's needed. - Guard a couple of variables on cleanup so that they only get released in case they have been set up. - Reset `oi->delta_base_oid` towards the end of the function, together with all the other object info pointers. Overall, all these changes result in a diff that is somewhat hard to read. But the end result is significantly easier to read and reason about, so I'd argue this one-time churn is worth it. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- object-file.c | 115 +++++++++++++++++++++++++++++++------------------- 1 file changed, 71 insertions(+), 44 deletions(-) diff --git a/object-file.c b/object-file.c index 6280e42f3412c3..e7e4c3348f9c1b 100644 --- a/object-file.c +++ b/object-file.c @@ -416,19 +416,16 @@ int odb_source_loose_read_object_info(struct odb_source *source, const struct object_id *oid, struct object_info *oi, int flags) { - int status = 0; + int ret; int fd; unsigned long mapsize; const char *path; - void *map; - git_zstream stream; + void *map = NULL; + git_zstream stream, *stream_to_end = NULL; char hdr[MAX_HEADER_LEN]; unsigned long size_scratch; enum object_type type_scratch; - if (oi && oi->delta_base_oid) - oidclr(oi->delta_base_oid, source->odb->repo->hash_algo); - /* * If we don't care about type or size, then we don't * need to look inside the object at all. Note that we @@ -439,71 +436,101 @@ int odb_source_loose_read_object_info(struct odb_source *source, */ if (!oi || (!oi->typep && !oi->sizep && !oi->contentp)) { struct stat st; - if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK)) - return quick_has_loose(source->loose, oid) ? 0 : -1; - if (stat_loose_object(source->loose, oid, &st, &path) < 0) - return -1; + + if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK)) { + ret = quick_has_loose(source->loose, oid) ? 0 : -1; + goto out; + } + + if (stat_loose_object(source->loose, oid, &st, &path) < 0) { + ret = -1; + goto out; + } + if (oi && oi->disk_sizep) *oi->disk_sizep = st.st_size; - return 0; + + ret = 0; + goto out; } fd = open_loose_object(source->loose, oid, &path); if (fd < 0) { if (errno != ENOENT) error_errno(_("unable to open loose object %s"), oid_to_hex(oid)); - return -1; + ret = -1; + goto out; } - map = map_fd(fd, path, &mapsize); - if (!map) - return -1; - if (!oi->sizep) - oi->sizep = &size_scratch; - if (!oi->typep) - oi->typep = &type_scratch; + map = map_fd(fd, path, &mapsize); + if (!map) { + ret = -1; + goto out; + } if (oi->disk_sizep) *oi->disk_sizep = mapsize; + stream_to_end = &stream; + switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr))) { case ULHR_OK: - if (parse_loose_header(hdr, oi) < 0) - status = error(_("unable to parse %s header"), oid_to_hex(oid)); - else if (*oi->typep < 0) + if (!oi->sizep) + oi->sizep = &size_scratch; + if (!oi->typep) + oi->typep = &type_scratch; + + if (parse_loose_header(hdr, oi) < 0) { + ret = error(_("unable to parse %s header"), oid_to_hex(oid)); + goto corrupt; + } + + if (*oi->typep < 0) die(_("invalid object type")); - if (!oi->contentp) - break; - *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid); - if (*oi->contentp) - goto cleanup; + if (oi->contentp) { + *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid); + if (!*oi->contentp) { + ret = -1; + goto corrupt; + } + } - status = -1; break; case ULHR_BAD: - status = error(_("unable to unpack %s header"), - oid_to_hex(oid)); - break; + ret = error(_("unable to unpack %s header"), + oid_to_hex(oid)); + goto corrupt; case ULHR_TOO_LONG: - status = error(_("header for %s too long, exceeds %d bytes"), - oid_to_hex(oid), MAX_HEADER_LEN); - break; + ret = error(_("header for %s too long, exceeds %d bytes"), + oid_to_hex(oid), MAX_HEADER_LEN); + goto corrupt; } - if (status && (flags & OBJECT_INFO_DIE_IF_CORRUPT)) + ret = 0; + +corrupt: + if (ret && (flags & OBJECT_INFO_DIE_IF_CORRUPT)) die(_("loose object %s (stored in %s) is corrupt"), oid_to_hex(oid), path); -cleanup: - git_inflate_end(&stream); - munmap(map, mapsize); - if (oi->sizep == &size_scratch) - oi->sizep = NULL; - if (oi->typep == &type_scratch) - oi->typep = NULL; - oi->whence = OI_LOOSE; - return status; +out: + if (stream_to_end) + git_inflate_end(stream_to_end); + if (map) + munmap(map, mapsize); + if (oi) { + if (oi->sizep == &size_scratch) + oi->sizep = NULL; + if (oi->typep == &type_scratch) + oi->typep = NULL; + if (oi->delta_base_oid) + oidclr(oi->delta_base_oid, source->odb->repo->hash_algo); + if (!ret) + oi->whence = OI_LOOSE; + } + + return ret; } static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c, From 7a4bd1b836c7437dfb8ff3650096750b98a6b3e4 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:00:42 +0100 Subject: [PATCH 25/50] packfile: always declare object info to be OI_PACKED When reading object info via a packfile we yield one of two types: - The object can either be OI_PACKED, which is what a caller would typically expect. - Or it can be OI_DBCACHED if it is stored in the delta base cache. The latter really is an implementation detail though, and callers typically don't care at all about the difference. Furthermore, the information whether or not it is part of the delta base cache can already be derived via the `is_delta` field, so the fact that we discern between OI_PACKED and OI_DBCACHED only further complicates the interface. There aren't all that many callers that care about the `whence` field in the first place. In fact, there's only three: - `packfile_store_read_object_info()` checks for `whence == OI_PACKED` and then populates the packfile information of the object info structure. We now start to do this also for deltified objects, which gives its callers strictly more information. - `repack_local_links()` wants to determine whether the object is part of a promisor pack and checks for `whence == OI_PACKED`. If so, it verifies that the packfile is a promisor pack. It's arguably wrong to declare that an object is not part of a promisor pack only because it is stored in the delta base cache. - `is_not_in_promisor_pack_obj()` does the same, but checks that a specific object is _not_ part of a promisor pack. The same reasoning as above applies. Drop the OI_DBCACHED enum completely. None of the callers seem to care about the distinction. Note that this also fixes a segfault introduced in 8c1b84bc97 (streaming: move logic to read packed objects streams into backend, 2025-11-23), which refactors how we stream packed objects. The intent is to only read packed objects in case they are stored non-deltified as we'd otherwise have to deflate them first. But the check for whether or not the object is stored as a delta was unconditionally done via `oi.u.packed.is_delta`, which is only valid in case `oi.whence` is `OI_PACKED`. But under some circumstances we got `OI_DBCACHED` here, which means that none of the `oi.u.packed` fields were initialized at all. Consequently, we assumed the object was not stored as a delta, and then try to read the object from `oi.u.packed.pack`, which is a `NULL` pointer and thus causes a segfault. Add a test case for this issue so that this cannot regress in the future anymore. Reported-by: Matt Smiley Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- odb.h | 1 - packfile.c | 3 +-- t/t5003-archive-zip.sh | 34 ++++++++++++++++++++++++++++++++++ 3 files changed, 35 insertions(+), 3 deletions(-) diff --git a/odb.h b/odb.h index 014cd9585a2f6e..73b0b87ad55cf2 100644 --- a/odb.h +++ b/odb.h @@ -330,7 +330,6 @@ struct object_info { OI_CACHED, OI_LOOSE, OI_PACKED, - OI_DBCACHED } whence; union { /* diff --git a/packfile.c b/packfile.c index 08a0863fc3374c..b0c6665c878d9e 100644 --- a/packfile.c +++ b/packfile.c @@ -1656,8 +1656,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, oidclr(oi->delta_base_oid, p->repo->hash_algo); } - oi->whence = in_delta_base_cache(p, obj_offset) ? OI_DBCACHED : - OI_PACKED; + oi->whence = OI_PACKED; out: unuse_pack(&w_curs); diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh index 961c6aac256135..c8c1c5c06b6037 100755 --- a/t/t5003-archive-zip.sh +++ b/t/t5003-archive-zip.sh @@ -239,6 +239,40 @@ check_zip with_untracked2 check_added with_untracked2 untracked one/untracked check_added with_untracked2 untracked two/untracked +test_expect_success 'git-archive --format=zip with bigFile delta chains' ' + test_when_finished rm -rf repo && + git init repo && + ( + cd repo && + test-tool genrandom foo 100000 >base && + { + cat base && + echo "trailing data" + } >delta-1 && + { + cat delta-1 && + echo "trailing data" + } >delta-2 && + git add . && + git commit -m "blobs" && + git repack -Ad && + git verify-pack -v .git/objects/pack/pack-*.idx >stats && + test_grep "chain length = 1: 1 object" stats && + test_grep "chain length = 2: 1 object" stats && + + git -c core.bigFileThreshold=1k archive --format=zip HEAD >archive.zip && + if test_have_prereq UNZIP + then + mkdir unpack && + cd unpack && + "$GIT_UNZIP" ../archive.zip && + test_cmp base ../base && + test_cmp delta-1 ../delta-1 && + test_cmp delta-2 ../delta-2 + fi + ) +' + # Test remote archive over HTTP protocol. # # Note: this should be the last part of this test suite, because From 27d9486cbc37a44565e4a97a84089c85741d4cd8 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:00:43 +0100 Subject: [PATCH 26/50] packfile: extend `is_delta` field to allow for "unknown" state The `struct object_info::u::packed::is_delta` field determines whether or not a specific object is stored as a delta. It only stores whether or not the object is stored as delta, so it is treated as a boolean value. This boolean is insufficient though: when reading a packed object via `packfile_store_read_object_info()` we know to skip parsing the actual object when the user didn't request any object-specific data. In that case we won't read the object itself, but will only look up its position in the packfile. Consequently, we do not know whether it is a delta or not. This isn't really an issue right now, as the check for an empty request is broken. But a subsequent commit will fix it, and once we do we will have the need to also represent an "unknown" delta state. Prepare for this change by introducing a new enum that encodes the object type. We don't use the "unknown" state just yet, but will start to do so in a subsequent commit. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- odb.h | 7 ++++++- packfile.c | 17 ++++++++++++++--- 2 files changed, 20 insertions(+), 4 deletions(-) diff --git a/odb.h b/odb.h index 73b0b87ad55cf2..afae5e5c0190ad 100644 --- a/odb.h +++ b/odb.h @@ -343,7 +343,12 @@ struct object_info { struct { struct packed_git *pack; off_t offset; - unsigned int is_delta; + enum packed_object_type { + PACKED_OBJECT_TYPE_UNKNOWN, + PACKED_OBJECT_TYPE_FULL, + PACKED_OBJECT_TYPE_OFS_DELTA, + PACKED_OBJECT_TYPE_REF_DELTA, + } type; } packed; } u; }; diff --git a/packfile.c b/packfile.c index b0c6665c878d9e..cc797b2b6ab7b8 100644 --- a/packfile.c +++ b/packfile.c @@ -2159,8 +2159,18 @@ int packfile_store_read_object_info(struct packfile_store *store, if (oi->whence == OI_PACKED) { oi->u.packed.offset = e.offset; oi->u.packed.pack = e.p; - oi->u.packed.is_delta = (rtype == OBJ_REF_DELTA || - rtype == OBJ_OFS_DELTA); + + switch (rtype) { + case OBJ_REF_DELTA: + oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA; + break; + case OBJ_OFS_DELTA: + oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA; + break; + default: + oi->u.packed.type = PACKED_OBJECT_TYPE_FULL; + break; + } } return 0; @@ -2531,7 +2541,8 @@ int packfile_store_read_object_stream(struct odb_read_stream **out, oi.sizep = &size; if (packfile_store_read_object_info(store, oid, &oi, 0) || - oi.u.packed.is_delta || + oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA || + oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA || repo_settings_get_big_file_threshold(store->odb->repo) >= size) return -1; From 57c168dc3824f1aaad553f90f55fdc86b75de561 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:00:44 +0100 Subject: [PATCH 27/50] packfile: always populate pack-specific info when reading object info When reading object information via `packed_object_info()` we may not populate the object info's packfile-specific fields. This leads to inconsistent object info depending on whether the info was populated via `packfile_store_read_object_info()` or `packed_object_info()`. Fix this inconsistency so that we can always assume the pack info to be populated when reading object info from a pack. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- packfile.c | 31 ++++++++++++++----------------- 1 file changed, 14 insertions(+), 17 deletions(-) diff --git a/packfile.c b/packfile.c index cc797b2b6ab7b8..f7c33a2f77a966 100644 --- a/packfile.c +++ b/packfile.c @@ -1657,6 +1657,20 @@ int packed_object_info(struct repository *r, struct packed_git *p, } oi->whence = OI_PACKED; + oi->u.packed.offset = obj_offset; + oi->u.packed.pack = p; + + switch (type) { + case OBJ_REF_DELTA: + oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA; + break; + case OBJ_OFS_DELTA: + oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA; + break; + default: + oi->u.packed.type = PACKED_OBJECT_TYPE_FULL; + break; + } out: unuse_pack(&w_curs); @@ -2156,23 +2170,6 @@ int packfile_store_read_object_info(struct packfile_store *store, return -1; } - if (oi->whence == OI_PACKED) { - oi->u.packed.offset = e.offset; - oi->u.packed.pack = e.p; - - switch (rtype) { - case OBJ_REF_DELTA: - oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA; - break; - case OBJ_OFS_DELTA: - oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA; - break; - default: - oi->u.packed.type = PACKED_OBJECT_TYPE_FULL; - break; - } - } - return 0; } From 8908c303da91400e8d9643da6fc697a081ac7374 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:00:45 +0100 Subject: [PATCH 28/50] packfile: disentangle return value of `packed_object_info()` The `packed_object_info()` function returns the type of the packed object. While we use an `enum object_type` to store the return value, this type is not to be confused with the actual object type. It _may_ contain the object type, but it may just as well encode that the given packed object is stored as a delta. We have removed the only caller that relied on this returned object type in the preceding commit, so let's simplify semantics and return either 0 on success or a negative error code otherwise. This unblocks a small optimization where we can skip reading the object type altogether. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- packfile.c | 21 ++++++++++++--------- packfile.h | 4 ++++ 2 files changed, 16 insertions(+), 9 deletions(-) diff --git a/packfile.c b/packfile.c index f7c33a2f77a966..8c6ef45a6727bd 100644 --- a/packfile.c +++ b/packfile.c @@ -1587,6 +1587,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, unsigned long size; off_t curpos = obj_offset; enum object_type type; + int ret; /* * We always get the representation type, but only convert it to @@ -1607,12 +1608,12 @@ int packed_object_info(struct repository *r, struct packed_git *p, off_t base_offset = get_delta_base(p, &w_curs, &tmp_pos, type, obj_offset); if (!base_offset) { - type = OBJ_BAD; + ret = -1; goto out; } *oi->sizep = get_size_from_delta(p, &w_curs, tmp_pos); if (*oi->sizep == 0) { - type = OBJ_BAD; + ret = -1; goto out; } } else { @@ -1625,7 +1626,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, if (offset_to_pack_pos(p, obj_offset, &pos) < 0) { error("could not find object at offset %"PRIuMAX" " "in pack %s", (uintmax_t)obj_offset, p->pack_name); - type = OBJ_BAD; + ret = -1; goto out; } @@ -1639,7 +1640,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, if (oi->typep) *oi->typep = ptot; if (ptot < 0) { - type = OBJ_BAD; + ret = -1; goto out; } } @@ -1649,7 +1650,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, if (get_delta_base_oid(p, &w_curs, curpos, oi->delta_base_oid, type, obj_offset) < 0) { - type = OBJ_BAD; + ret = -1; goto out; } } else @@ -1672,9 +1673,11 @@ int packed_object_info(struct repository *r, struct packed_git *p, break; } + ret = 0; + out: unuse_pack(&w_curs); - return type; + return ret; } static void *unpack_compressed_entry(struct packed_git *p, @@ -2152,7 +2155,7 @@ int packfile_store_read_object_info(struct packfile_store *store, unsigned flags UNUSED) { struct pack_entry e; - int rtype; + int ret; if (!find_pack_entry(store->odb->repo, oid, &e)) return 1; @@ -2164,8 +2167,8 @@ int packfile_store_read_object_info(struct packfile_store *store, if (!oi) return 0; - rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi); - if (rtype < 0) { + ret = packed_object_info(store->odb->repo, e.p, e.offset, oi); + if (ret < 0) { mark_bad_packed_object(e.p, oid); return -1; } diff --git a/packfile.h b/packfile.h index 59d162a3f415e5..d7cce582aff090 100644 --- a/packfile.h +++ b/packfile.h @@ -378,6 +378,10 @@ void release_pack_memory(size_t); /* global flag to enable extra checks when accessing packed objects */ extern int do_check_packed_object_crc; +/* + * Look up the object info for a specific offset in the packfile. + * Returns zero on success, a negative error code otherwise. + */ int packed_object_info(struct repository *r, struct packed_git *pack, off_t offset, struct object_info *); From 5ff29698e077e712740691a1d15e8320115ebdbf Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:00:46 +0100 Subject: [PATCH 29/50] packfile: skip unpacking object header for disk size requests MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit While most of the object info requests for a packed object require us to unpack its headers, reading its disk size doesn't. We still unpack the object header in that case though, which is unnecessary work. Skip reading the header if only the disk size is requested. This leads to a small speedup when reading disk size, only. The following benchmark was done in the Git repository: Benchmark 1: ./git rev-list --disk-usage HEAD (rev = HEAD~) Time (mean ± σ): 105.2 ms ± 0.6 ms [User: 91.4 ms, System: 13.3 ms] Range (min … max): 103.7 ms … 106.0 ms 27 runs Benchmark 2: ./git rev-list --disk-usage HEAD (rev = HEAD) Time (mean ± σ): 96.7 ms ± 0.4 ms [User: 86.2 ms, System: 10.0 ms] Range (min … max): 96.2 ms … 98.1 ms 30 runs Summary ./git rev-list --disk-usage HEAD (rev = HEAD) ran 1.09 ± 0.01 times faster than ./git rev-list --disk-usage HEAD (rev = HEAD~) Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- packfile.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/packfile.c b/packfile.c index 8c6ef45a6727bd..a2ba237ce76f57 100644 --- a/packfile.c +++ b/packfile.c @@ -1586,7 +1586,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, struct pack_window *w_curs = NULL; unsigned long size; off_t curpos = obj_offset; - enum object_type type; + enum object_type type = OBJ_NONE; int ret; /* @@ -1598,7 +1598,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, &type); if (!*oi->contentp) type = OBJ_BAD; - } else { + } else if (oi->sizep || oi->typep || oi->delta_base_oid) { type = unpack_object_header(p, &w_curs, &curpos, &size); } @@ -1662,6 +1662,9 @@ int packed_object_info(struct repository *r, struct packed_git *p, oi->u.packed.pack = p; switch (type) { + case OBJ_NONE: + oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN; + break; case OBJ_REF_DELTA: oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA; break; From 12d3b58b5552750f351ded7166b347446d9543f3 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:00:47 +0100 Subject: [PATCH 30/50] packfile: drop repository parameter from `packed_object_info()` The function `packed_object_info()` takes a packfile and offset and returns the object info for the corresponding object. Despite these two parameters though it also takes a repository pointer. This is redundant information though, as `struct packed_git` already has a repository pointer that is always populated. Drop the redundant parameter. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- builtin/cat-file.c | 3 +-- builtin/pack-objects.c | 4 ++-- commit-graph.c | 2 +- pack-bitmap.c | 3 +-- packfile.c | 8 ++++---- packfile.h | 3 +-- 6 files changed, 10 insertions(+), 13 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 505ddaa12f5309..2ad712e9f8f55c 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -487,8 +487,7 @@ static void batch_object_write(const char *obj_name, data->info.sizep = &data->size; if (pack) - ret = packed_object_info(the_repository, pack, - offset, &data->info); + ret = packed_object_info(pack, offset, &data->info); else ret = odb_read_object_info_extended(the_repository->objects, &data->oid, &data->info, diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 1ce8d6ee215326..85762f8c4fc4cc 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2411,7 +2411,7 @@ static void drop_reused_delta(struct object_entry *entry) oi.sizep = &size; oi.typep = &type; - if (packed_object_info(the_repository, IN_PACK(entry), entry->in_pack_offset, &oi) < 0) { + if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) { /* * We failed to get the info from this pack for some reason; * fall back to odb_read_object_info, which may find another copy. @@ -3748,7 +3748,7 @@ static int add_object_entry_from_pack(const struct object_id *oid, struct object_info oi = OBJECT_INFO_INIT; oi.typep = &type; - if (packed_object_info(the_repository, p, ofs, &oi) < 0) { + if (packed_object_info(p, ofs, &oi) < 0) { die(_("could not get type of object %s in pack %s"), oid_to_hex(oid), p->pack_name); } else if (type == OBJ_COMMIT) { diff --git a/commit-graph.c b/commit-graph.c index 80be2ff2c39842..f572670bd01946 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1499,7 +1499,7 @@ static int add_packed_commits(const struct object_id *oid, display_progress(ctx->progress, ++ctx->progress_done); oi.typep = &type; - if (packed_object_info(ctx->r, pack, offset, &oi) < 0) + if (packed_object_info(pack, offset, &oi) < 0) die(_("unable to get type of object %s"), oid_to_hex(oid)); if (type != OBJ_COMMIT) diff --git a/pack-bitmap.c b/pack-bitmap.c index 8ca79725b1d438..972203f12b6d9b 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1876,8 +1876,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git, ofs = pack_pos_to_offset(pack, pos); } - if (packed_object_info(bitmap_repo(bitmap_git), pack, ofs, - &oi) < 0) { + if (packed_object_info(pack, ofs, &oi) < 0) { struct object_id oid; nth_bitmap_object_oid(bitmap_git, &oid, pack_pos_to_index(pack, pos)); diff --git a/packfile.c b/packfile.c index a2ba237ce76f57..39899aec494468 100644 --- a/packfile.c +++ b/packfile.c @@ -1580,7 +1580,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset, hashmap_add(&delta_base_cache, &ent->ent); } -int packed_object_info(struct repository *r, struct packed_git *p, +int packed_object_info(struct packed_git *p, off_t obj_offset, struct object_info *oi) { struct pack_window *w_curs = NULL; @@ -1594,7 +1594,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, * a "real" type later if the caller is interested. */ if (oi->contentp) { - *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep, + *oi->contentp = cache_or_unpack_entry(p->repo, p, obj_offset, oi->sizep, &type); if (!*oi->contentp) type = OBJ_BAD; @@ -1635,7 +1635,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, if (oi->typep) { enum object_type ptot; - ptot = packed_to_object_type(r, p, obj_offset, + ptot = packed_to_object_type(p->repo, p, obj_offset, type, &w_curs, curpos); if (oi->typep) *oi->typep = ptot; @@ -2170,7 +2170,7 @@ int packfile_store_read_object_info(struct packfile_store *store, if (!oi) return 0; - ret = packed_object_info(store->odb->repo, e.p, e.offset, oi); + ret = packed_object_info(e.p, e.offset, oi); if (ret < 0) { mark_bad_packed_object(e.p, oid); return -1; diff --git a/packfile.h b/packfile.h index d7cce582aff090..33fed263620432 100644 --- a/packfile.h +++ b/packfile.h @@ -382,8 +382,7 @@ extern int do_check_packed_object_crc; * Look up the object info for a specific offset in the packfile. * Returns zero on success, a negative error code otherwise. */ -int packed_object_info(struct repository *r, - struct packed_git *pack, +int packed_object_info(struct packed_git *pack, off_t offset, struct object_info *); void mark_bad_packed_object(struct packed_git *, const struct object_id *); From df971a7c42fa5506cb7e845ac175104093c35e8d Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:02:50 +0100 Subject: [PATCH 31/50] refs/files: simplify iterating through root refs When iterating through root refs we first need to determine the directory in which the refs live. This is done by retrieving the root of the loose refs via `refs->loose->root->name`, and putting it through `files_ref_path()` to derive the final path. This is somewhat redundant though: the root name of the loose files cache is always going to be the empty string. As such, we always end up passing that empty string to `files_ref_path()` as the ref hierarchy we want to start. And this actually makes sense: `files_ref_path()` already computes the location of the root directory, so of course we need to pass the empty string for the ref hierarchy itself. So going via the loose ref cache to figure out that the root of a ref hierarchy is empty is only causing confusion. But next to the added confusion, it can also lead to a segfault. The loose ref cache is populated lazily, so it may not always be set. It seems to be sheer luck that this is a condition we do not currently hit. The right thing to do would be to call `get_loose_ref_cache()`, which knows to populate the cache if required. Simplify the code and fix the potential segfault by simply removing the indirection via the loose ref cache completely. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- refs/files-backend.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index 6f6f76a8d86dc4..297739f203d74e 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -354,13 +354,11 @@ static int for_each_root_ref(struct files_ref_store *refs, void *cb_data) { struct strbuf path = STRBUF_INIT, refname = STRBUF_INIT; - const char *dirname = refs->loose->root->name; struct dirent *de; - size_t dirnamelen; int ret; DIR *d; - files_ref_path(refs, &path, dirname); + files_ref_path(refs, &path, ""); d = opendir(path.buf); if (!d) { @@ -368,9 +366,6 @@ static int for_each_root_ref(struct files_ref_store *refs, return -1; } - strbuf_addstr(&refname, dirname); - dirnamelen = refname.len; - while ((de = readdir(d)) != NULL) { unsigned char dtype; @@ -378,6 +373,8 @@ static int for_each_root_ref(struct files_ref_store *refs, continue; if (ends_with(de->d_name, ".lock")) continue; + + strbuf_reset(&refname); strbuf_addstr(&refname, de->d_name); dtype = get_dtype(de, &path, 1); @@ -386,8 +383,6 @@ static int for_each_root_ref(struct files_ref_store *refs, if (ret) goto done; } - - strbuf_setlen(&refname, dirnamelen); } ret = 0; From e615643d2ecf2d4e44b0ca29cf949a5212618245 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:02:51 +0100 Subject: [PATCH 32/50] refs/files: move fsck functions into global scope When performing consistency checks we pass the functions that perform the verification down the calling stack. This is somewhat unnecessary though, as the set of functions doesn't ever change. Simplify the code by moving the array into global scope and remove the parameter. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- refs/files-backend.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index 297739f203d74e..feba3ee58bcc9b 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3890,11 +3890,16 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED, return ret; } +static const files_fsck_refs_fn fsck_refs_fn[]= { + files_fsck_refs_name, + files_fsck_refs_content, + NULL, +}; + static int files_fsck_refs_dir(struct ref_store *ref_store, struct fsck_options *o, const char *refs_check_dir, - struct worktree *wt, - files_fsck_refs_fn *fsck_refs_fn) + struct worktree *wt) { struct strbuf refname = STRBUF_INIT; struct strbuf sb = STRBUF_INIT; @@ -3955,13 +3960,7 @@ static int files_fsck_refs(struct ref_store *ref_store, struct fsck_options *o, struct worktree *wt) { - files_fsck_refs_fn fsck_refs_fn[]= { - files_fsck_refs_name, - files_fsck_refs_content, - NULL, - }; - - return files_fsck_refs_dir(ref_store, o, "refs", wt, fsck_refs_fn); + return files_fsck_refs_dir(ref_store, o, "refs", wt); } static int files_fsck(struct ref_store *ref_store, From 5a74903e62d7f4acdb2849103a2763b160a763b1 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:02:52 +0100 Subject: [PATCH 33/50] refs/files: remove `refs_check_dir` parameter The parameter `refs_check_dir` determines which directory we want to check references for. But as we always want to check the complete refs hierarchy, this parameter is always set to "refs". Drop the parameter and hardcode it. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- refs/files-backend.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index feba3ee58bcc9b..0a104c7bf6a8a5 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3898,7 +3898,6 @@ static const files_fsck_refs_fn fsck_refs_fn[]= { static int files_fsck_refs_dir(struct ref_store *ref_store, struct fsck_options *o, - const char *refs_check_dir, struct worktree *wt) { struct strbuf refname = STRBUF_INIT; @@ -3907,7 +3906,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store, int iter_status; int ret = 0; - strbuf_addf(&sb, "%s/%s", ref_store->gitdir, refs_check_dir); + strbuf_addf(&sb, "%s/refs", ref_store->gitdir); iter = dir_iterator_begin(sb.buf, 0); if (!iter) { @@ -3927,8 +3926,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store, if (!is_main_worktree(wt)) strbuf_addf(&refname, "worktrees/%s/", wt->id); - strbuf_addf(&refname, "%s/%s", refs_check_dir, - iter->relative_path); + strbuf_addf(&refname, "refs/%s", iter->relative_path); if (o->verbose) fprintf_ln(stderr, "Checking %s", refname.buf); @@ -3960,7 +3958,7 @@ static int files_fsck_refs(struct ref_store *ref_store, struct fsck_options *o, struct worktree *wt) { - return files_fsck_refs_dir(ref_store, o, "refs", wt); + return files_fsck_refs_dir(ref_store, o, wt); } static int files_fsck(struct ref_store *ref_store, From 2fe33ae20fcce7a1e91cfeec37409d511ab8aefb Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:02:53 +0100 Subject: [PATCH 34/50] refs/files: remove useless indirection The function `files_fsck_refs()` only has a single callsite and forwards all of its arguments as-is, so it's basically a useless indirection. Inline the function call. While at it, also remove the bitwise or that we have for return values. We don't really want to or them at all, but rather just want to return an error in case either of the functions has failed. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- refs/files-backend.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index 0a104c7bf6a8a5..4cbee23dad1d5b 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3954,22 +3954,20 @@ static int files_fsck_refs_dir(struct ref_store *ref_store, return ret; } -static int files_fsck_refs(struct ref_store *ref_store, - struct fsck_options *o, - struct worktree *wt) -{ - return files_fsck_refs_dir(ref_store, o, wt); -} - static int files_fsck(struct ref_store *ref_store, struct fsck_options *o, struct worktree *wt) { struct files_ref_store *refs = files_downcast(ref_store, REF_STORE_READ, "fsck"); + int ret = 0; - return files_fsck_refs(ref_store, o, wt) | - refs->packed_ref_store->be->fsck(refs->packed_ref_store, o, wt); + if (files_fsck_refs_dir(ref_store, o, wt) < 0) + ret = -1; + if (refs->packed_ref_store->be->fsck(refs->packed_ref_store, o, wt) < 0) + ret = -1; + + return ret; } struct ref_storage_be refs_be_files = { From 0ff9cf40b2ce3df2f6eda0875eb54fe3c3487f5b Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:02:54 +0100 Subject: [PATCH 35/50] refs/files: extract function to check single ref When checking the consistency of references we create a directory iterator and then verify each single reference in a loop. The logic to perform the actual checks is embedded into that loop, which makes it hard to reuse. But In a subsequent commit we're about to introduce a second path that wants to verify references. Prepare for this by extracting the logic to check a single reference into a standalone function. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- refs/files-backend.c | 80 ++++++++++++++++++++++++++++---------------- 1 file changed, 51 insertions(+), 29 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index 4cbee23dad1d5b..9972221f9f4819 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3715,7 +3715,8 @@ static int files_ref_store_remove_on_disk(struct ref_store *ref_store, typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store, struct fsck_options *o, const char *refname, - struct dir_iterator *iter); + const char *path, + int mode); static int files_fsck_symref_target(struct fsck_options *o, struct fsck_ref_report *report, @@ -3772,7 +3773,8 @@ static int files_fsck_symref_target(struct fsck_options *o, static int files_fsck_refs_content(struct ref_store *ref_store, struct fsck_options *o, const char *target_name, - struct dir_iterator *iter) + const char *path, + int mode) { struct strbuf ref_content = STRBUF_INIT; struct strbuf abs_gitdir = STRBUF_INIT; @@ -3786,7 +3788,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store, report.path = target_name; - if (S_ISLNK(iter->st.st_mode)) { + if (S_ISLNK(mode)) { const char *relative_referent_path = NULL; ret = fsck_report_ref(o, &report, @@ -3798,7 +3800,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store, if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1])) strbuf_addch(&abs_gitdir, '/'); - strbuf_add_real_path(&ref_content, iter->path.buf); + strbuf_add_real_path(&ref_content, path); skip_prefix(ref_content.buf, abs_gitdir.buf, &relative_referent_path); @@ -3811,7 +3813,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store, goto cleanup; } - if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) { + if (strbuf_read_file(&ref_content, path, 0) < 0) { /* * Ref file could be removed by another concurrent process. We should * ignore this error and continue to the next ref. @@ -3819,7 +3821,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store, if (errno == ENOENT) goto cleanup; - ret = error_errno(_("cannot read ref file '%s'"), iter->path.buf); + ret = error_errno(_("cannot read ref file '%s'"), path); goto cleanup; } @@ -3861,16 +3863,20 @@ static int files_fsck_refs_content(struct ref_store *ref_store, static int files_fsck_refs_name(struct ref_store *ref_store UNUSED, struct fsck_options *o, const char *refname, - struct dir_iterator *iter) + const char *path, + int mode UNUSED) { struct strbuf sb = STRBUF_INIT; + const char *filename; int ret = 0; + filename = basename((char *) path); + /* * Ignore the files ending with ".lock" as they may be lock files * However, do not allow bare ".lock" files. */ - if (iter->basename[0] != '.' && ends_with(iter->basename, ".lock")) + if (filename[0] != '.' && ends_with(filename, ".lock")) goto cleanup; /* @@ -3896,6 +3902,35 @@ static const files_fsck_refs_fn fsck_refs_fn[]= { NULL, }; +static int files_fsck_ref(struct ref_store *ref_store, + struct fsck_options *o, + const char *refname, + const char *path, + int mode) +{ + int ret = 0; + + if (o->verbose) + fprintf_ln(stderr, "Checking %s", refname); + + if (!S_ISREG(mode) && !S_ISLNK(mode)) { + struct fsck_ref_report report = { .path = refname }; + + if (fsck_report_ref(o, &report, + FSCK_MSG_BAD_REF_FILETYPE, + "unexpected file type")) + ret = -1; + goto out; + } + + for (size_t i = 0; fsck_refs_fn[i]; i++) + if (fsck_refs_fn[i](ref_store, o, refname, path, mode)) + ret = -1; + +out: + return ret; +} + static int files_fsck_refs_dir(struct ref_store *ref_store, struct fsck_options *o, struct worktree *wt) @@ -3918,30 +3953,17 @@ static int files_fsck_refs_dir(struct ref_store *ref_store, } while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) { - if (S_ISDIR(iter->st.st_mode)) { + if (S_ISDIR(iter->st.st_mode)) continue; - } else if (S_ISREG(iter->st.st_mode) || - S_ISLNK(iter->st.st_mode)) { - strbuf_reset(&refname); - - if (!is_main_worktree(wt)) - strbuf_addf(&refname, "worktrees/%s/", wt->id); - strbuf_addf(&refname, "refs/%s", iter->relative_path); - if (o->verbose) - fprintf_ln(stderr, "Checking %s", refname.buf); + strbuf_reset(&refname); + if (!is_main_worktree(wt)) + strbuf_addf(&refname, "worktrees/%s/", wt->id); + strbuf_addf(&refname, "refs/%s", iter->relative_path); - for (size_t i = 0; fsck_refs_fn[i]; i++) { - if (fsck_refs_fn[i](ref_store, o, refname.buf, iter)) - ret = -1; - } - } else { - struct fsck_ref_report report = { .path = iter->basename }; - if (fsck_report_ref(o, &report, - FSCK_MSG_BAD_REF_FILETYPE, - "unexpected file type")) - ret = -1; - } + if (files_fsck_ref(ref_store, o, refname.buf, + iter->path.buf, iter->st.st_mode) < 0) + ret = -1; } if (iter_status != ITER_DONE) From 9ebccf744a967f399579a3f3ffbeb40120f5a1e1 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:02:55 +0100 Subject: [PATCH 36/50] refs/files: improve error handling when verifying symrefs The error handling when verifying symbolic refs is a bit on the wild side: - `fsck_report_ref()` can be told to ignore specific errors. If an error has been ignored and a previous check raised an unignored error, then assigning `ret = fsck_report_ref()` will cause us to swallow the previous error. - When the target reference is not valid we bail out early without checking for other errors. Fix both of these issues by consistently or'ing the return value and not bailing out early. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- refs/files-backend.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index 9972221f9f4819..abc21653395b13 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3737,17 +3737,15 @@ static int files_fsck_symref_target(struct fsck_options *o, if (!is_referent_root && !starts_with(referent->buf, "refs/") && !starts_with(referent->buf, "worktrees/")) { - ret = fsck_report_ref(o, report, - FSCK_MSG_SYMREF_TARGET_IS_NOT_A_REF, - "points to non-ref target '%s'", referent->buf); - + ret |= fsck_report_ref(o, report, + FSCK_MSG_SYMREF_TARGET_IS_NOT_A_REF, + "points to non-ref target '%s'", referent->buf); } if (!is_referent_root && check_refname_format(referent->buf, 0)) { - ret = fsck_report_ref(o, report, - FSCK_MSG_BAD_REFERENT_NAME, - "points to invalid refname '%s'", referent->buf); - goto out; + ret |= fsck_report_ref(o, report, + FSCK_MSG_BAD_REFERENT_NAME, + "points to invalid refname '%s'", referent->buf); } if (symbolic_link) @@ -3755,19 +3753,19 @@ static int files_fsck_symref_target(struct fsck_options *o, if (referent->len == orig_len || (referent->len < orig_len && orig_last_byte != '\n')) { - ret = fsck_report_ref(o, report, - FSCK_MSG_REF_MISSING_NEWLINE, - "misses LF at the end"); + ret |= fsck_report_ref(o, report, + FSCK_MSG_REF_MISSING_NEWLINE, + "misses LF at the end"); } if (referent->len != orig_len && referent->len != orig_len - 1) { - ret = fsck_report_ref(o, report, - FSCK_MSG_TRAILING_REF_CONTENT, - "has trailing whitespaces or newlines"); + ret |= fsck_report_ref(o, report, + FSCK_MSG_TRAILING_REF_CONTENT, + "has trailing whitespaces or newlines"); } out: - return ret; + return ret ? -1 : 0; } static int files_fsck_refs_content(struct ref_store *ref_store, From 7b8c36a2a74868b6fa47e3b11e2c1c0f89c88d43 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:02:56 +0100 Subject: [PATCH 37/50] refs/files: perform consistency checks for root refs While the "files" backend already knows to perform consistency checks for the "refs/" hierarchy, it doesn't verify any of its root refs. Plug this omission. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- refs/files-backend.c | 50 +++++++++++++++++++++++++++++++++++++--- t/t0602-reffiles-fsck.sh | 30 ++++++++++++++++++++++++ 2 files changed, 77 insertions(+), 3 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index abc21653395b13..9ae80b700aa315 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3877,9 +3877,9 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED, if (filename[0] != '.' && ends_with(filename, ".lock")) goto cleanup; - /* - * This works right now because we never check the root refs. - */ + if (is_root_ref(refname)) + goto cleanup; + if (check_refname_format(refname, 0)) { struct fsck_ref_report report = { 0 }; @@ -3974,19 +3974,63 @@ static int files_fsck_refs_dir(struct ref_store *ref_store, return ret; } +struct files_fsck_root_ref_data { + struct files_ref_store *refs; + struct fsck_options *o; + struct worktree *wt; + struct strbuf refname; + struct strbuf path; +}; + +static int files_fsck_root_ref(const char *refname, void *cb_data) +{ + struct files_fsck_root_ref_data *data = cb_data; + struct stat st; + + strbuf_reset(&data->refname); + if (!is_main_worktree(data->wt)) + strbuf_addf(&data->refname, "worktrees/%s/", data->wt->id); + strbuf_addstr(&data->refname, refname); + + strbuf_reset(&data->path); + strbuf_addf(&data->path, "%s/%s", data->refs->gitcommondir, data->refname.buf); + + if (stat(data->path.buf, &st)) { + if (errno == ENOENT) + return 0; + return error_errno("failed to read ref: '%s'", data->path.buf); + } + + return files_fsck_ref(&data->refs->base, data->o, data->refname.buf, + data->path.buf, st.st_mode); +} + static int files_fsck(struct ref_store *ref_store, struct fsck_options *o, struct worktree *wt) { struct files_ref_store *refs = files_downcast(ref_store, REF_STORE_READ, "fsck"); + struct files_fsck_root_ref_data data = { + .refs = refs, + .o = o, + .wt = wt, + .refname = STRBUF_INIT, + .path = STRBUF_INIT, + }; int ret = 0; if (files_fsck_refs_dir(ref_store, o, wt) < 0) ret = -1; + + if (for_each_root_ref(refs, files_fsck_root_ref, &data) < 0) + ret = -1; + if (refs->packed_ref_store->be->fsck(refs->packed_ref_store, o, wt) < 0) ret = -1; + strbuf_release(&data.refname); + strbuf_release(&data.path); return ret; } diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh index 0ef483659d561f..479f3d528e6ca8 100755 --- a/t/t0602-reffiles-fsck.sh +++ b/t/t0602-reffiles-fsck.sh @@ -905,4 +905,34 @@ test_expect_success '--[no-]references option should apply to fsck' ' ) ' +test_expect_success 'complains about broken root ref' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + echo "ref: refs/../HEAD" >.git/HEAD && + test_must_fail git refs verify 2>err && + cat >expect <<-EOF && + error: HEAD: badReferentName: points to invalid refname ${SQ}refs/../HEAD${SQ} + EOF + test_cmp expect err + ) +' + +test_expect_success 'complains about broken root ref in worktree' ' + test_when_finished "rm -rf repo worktree" && + git init repo && + ( + cd repo && + test_commit initial && + git worktree add ../worktree && + echo "ref: refs/../HEAD" >.git/worktrees/worktree/HEAD && + test_must_fail git refs verify 2>err && + cat >expect <<-EOF && + error: worktrees/worktree/HEAD: badReferentName: points to invalid refname ${SQ}refs/../HEAD${SQ} + EOF + test_cmp expect err + ) +' + test_done From 70b338d60c8d90a56a43fa69fd49778b94b0de72 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:02:57 +0100 Subject: [PATCH 38/50] fsck: drop unused fields from `struct fsck_ref_report` The `struct fsck_ref_report` has a couple fields that are intended to improve the error reporting for broken ref reports by showing which object ID or target reference the ref points to. These fields are never set though and are thus essentially unused. Remove them. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- fsck.c | 5 ----- fsck.h | 2 -- 2 files changed, 7 deletions(-) diff --git a/fsck.c b/fsck.c index fae18d8561e067..813d927d57d4b0 100644 --- a/fsck.c +++ b/fsck.c @@ -1310,11 +1310,6 @@ int fsck_refs_error_function(struct fsck_options *options UNUSED, strbuf_addstr(&sb, report->path); - if (report->oid) - strbuf_addf(&sb, " -> (%s)", oid_to_hex(report->oid)); - else if (report->referent) - strbuf_addf(&sb, " -> (%s)", report->referent); - if (msg_type == FSCK_WARN) warning("%s: %s", sb.buf, message); else diff --git a/fsck.h b/fsck.h index 336917c0451aac..bfe0d9c6d2e44e 100644 --- a/fsck.h +++ b/fsck.h @@ -162,8 +162,6 @@ struct fsck_object_report { struct fsck_ref_report { const char *path; - const struct object_id *oid; - const char *referent; }; struct fsck_options { From dcecffb616762893c44f98d556493144edbcd498 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:02:58 +0100 Subject: [PATCH 39/50] refs/files: extract generic symref target checks The consistency checks for the "files" backend contain a couple of verifications for symrefs that verify generic properties of the target reference. These properties need to hold for every backend, no matter whether it's using the "files" or "reftable" backend. Reimplementing these checks for every single backend doesn't really make sense. Extract it into a generic `refs_fsck_symref()` function that can be used by other backends, as well. The "reftable" backend will be wired up in a subsequent commit. While at it, improve the consistency checks so that we don't complain about refs pointing to a non-ref target in case the target refname format does not verify. Otherwise it's very likely that we'll generate both error messages, which feels somewhat redundant in this case. Note that the function has a couple of `UNUSED` parameters. These will become referenced in a subsequent commit. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- refs.c | 21 +++++++++++++++++ refs.h | 10 ++++++++ refs/files-backend.c | 54 +++++++++++++++++--------------------------- 3 files changed, 52 insertions(+), 33 deletions(-) diff --git a/refs.c b/refs.c index e06e0cb07283d3..739bf9fefc7bbf 100644 --- a/refs.c +++ b/refs.c @@ -320,6 +320,27 @@ int check_refname_format(const char *refname, int flags) return check_or_sanitize_refname(refname, flags, NULL); } +int refs_fsck_symref(struct ref_store *refs UNUSED, struct fsck_options *o, + struct fsck_ref_report *report, + const char *refname UNUSED, const char *target) +{ + if (is_root_ref(target)) + return 0; + + if (check_refname_format(target, 0) && + fsck_report_ref(o, report, FSCK_MSG_BAD_REFERENT_NAME, + "points to invalid refname '%s'", target)) + return -1; + + if (!starts_with(target, "refs/") && + !starts_with(target, "worktrees/") && + fsck_report_ref(o, report, FSCK_MSG_SYMREF_TARGET_IS_NOT_A_REF, + "points to non-ref target '%s'", target)) + return -1; + + return 0; +} + int refs_fsck(struct ref_store *refs, struct fsck_options *o, struct worktree *wt) { diff --git a/refs.h b/refs.h index d9051bbb0414c2..d91fcb2d2f0526 100644 --- a/refs.h +++ b/refs.h @@ -653,6 +653,16 @@ int refs_for_each_reflog(struct ref_store *refs, each_reflog_fn fn, void *cb_dat */ int check_refname_format(const char *refname, int flags); +struct fsck_ref_report; + +/* + * Perform generic checks for a specific symref target. This function is + * expected to be called by the ref backends for every symbolic ref. + */ +int refs_fsck_symref(struct ref_store *refs, struct fsck_options *o, + struct fsck_ref_report *report, + const char *refname, const char *target); + /* * Check the reference database for consistency. Return 0 if refs and * reflogs are consistent, and non-zero otherwise. The errors will be diff --git a/refs/files-backend.c b/refs/files-backend.c index 9ae80b700aa315..687c26ddcbc8c7 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3718,53 +3718,39 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store, const char *path, int mode); -static int files_fsck_symref_target(struct fsck_options *o, +static int files_fsck_symref_target(struct ref_store *ref_store, + struct fsck_options *o, struct fsck_ref_report *report, + const char *refname, struct strbuf *referent, unsigned int symbolic_link) { - int is_referent_root; char orig_last_byte; size_t orig_len; int ret = 0; orig_len = referent->len; orig_last_byte = referent->buf[orig_len - 1]; - if (!symbolic_link) - strbuf_rtrim(referent); - - is_referent_root = is_root_ref(referent->buf); - if (!is_referent_root && - !starts_with(referent->buf, "refs/") && - !starts_with(referent->buf, "worktrees/")) { - ret |= fsck_report_ref(o, report, - FSCK_MSG_SYMREF_TARGET_IS_NOT_A_REF, - "points to non-ref target '%s'", referent->buf); - } - if (!is_referent_root && check_refname_format(referent->buf, 0)) { - ret |= fsck_report_ref(o, report, - FSCK_MSG_BAD_REFERENT_NAME, - "points to invalid refname '%s'", referent->buf); - } + if (!symbolic_link) { + strbuf_rtrim(referent); - if (symbolic_link) - goto out; + if (referent->len == orig_len || + (referent->len < orig_len && orig_last_byte != '\n')) { + ret |= fsck_report_ref(o, report, + FSCK_MSG_REF_MISSING_NEWLINE, + "misses LF at the end"); + } - if (referent->len == orig_len || - (referent->len < orig_len && orig_last_byte != '\n')) { - ret |= fsck_report_ref(o, report, - FSCK_MSG_REF_MISSING_NEWLINE, - "misses LF at the end"); + if (referent->len != orig_len && referent->len != orig_len - 1) { + ret |= fsck_report_ref(o, report, + FSCK_MSG_TRAILING_REF_CONTENT, + "has trailing whitespaces or newlines"); + } } - if (referent->len != orig_len && referent->len != orig_len - 1) { - ret |= fsck_report_ref(o, report, - FSCK_MSG_TRAILING_REF_CONTENT, - "has trailing whitespaces or newlines"); - } + ret |= refs_fsck_symref(ref_store, o, report, refname, referent->buf); -out: return ret ? -1 : 0; } @@ -3807,7 +3793,8 @@ static int files_fsck_refs_content(struct ref_store *ref_store, else strbuf_addbuf(&referent, &ref_content); - ret |= files_fsck_symref_target(o, &report, &referent, 1); + ret |= files_fsck_symref_target(ref_store, o, &report, + target_name, &referent, 1); goto cleanup; } @@ -3847,7 +3834,8 @@ static int files_fsck_refs_content(struct ref_store *ref_store, goto cleanup; } } else { - ret = files_fsck_symref_target(o, &report, &referent, 0); + ret = files_fsck_symref_target(ref_store, o, &report, + target_name, &referent, 0); goto cleanup; } From ae38c3a359f4834d27f511f1fa9b982f5ad55c6a Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:02:59 +0100 Subject: [PATCH 40/50] refs/files: introduce function to perform normal ref checks In a subsequent commit we'll introduce new generic checks for direct refs. These checks will be independent of the actual backend. Introduce a new function `refs_fsck_ref()` that will be used for this purpose. At the current point in time it's still empty, but it will get populated in a subsequent commit. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- refs.c | 7 +++++++ refs.h | 8 ++++++++ refs/files-backend.c | 2 ++ 3 files changed, 17 insertions(+) diff --git a/refs.c b/refs.c index 739bf9fefc7bbf..4fc1317cb32323 100644 --- a/refs.c +++ b/refs.c @@ -320,6 +320,13 @@ int check_refname_format(const char *refname, int flags) return check_or_sanitize_refname(refname, flags, NULL); } +int refs_fsck_ref(struct ref_store *refs UNUSED, struct fsck_options *o UNUSED, + struct fsck_ref_report *report UNUSED, + const char *refname UNUSED, const struct object_id *oid UNUSED) +{ + return 0; +} + int refs_fsck_symref(struct ref_store *refs UNUSED, struct fsck_options *o, struct fsck_ref_report *report, const char *refname UNUSED, const char *target) diff --git a/refs.h b/refs.h index d91fcb2d2f0526..f0abfa1d93633e 100644 --- a/refs.h +++ b/refs.h @@ -655,6 +655,14 @@ int check_refname_format(const char *refname, int flags); struct fsck_ref_report; +/* + * Perform generic checks for a specific direct ref. This function is + * expected to be called by the ref backends for every symbolic ref. + */ +int refs_fsck_ref(struct ref_store *refs, struct fsck_options *o, + struct fsck_ref_report *report, + const char *refname, const struct object_id *oid); + /* * Perform generic checks for a specific symref target. This function is * expected to be called by the ref backends for every symbolic ref. diff --git a/refs/files-backend.c b/refs/files-backend.c index 687c26ddcbc8c7..240d3c3b26e0b5 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3833,6 +3833,8 @@ static int files_fsck_refs_content(struct ref_store *ref_store, "has trailing garbage: '%s'", trailing); goto cleanup; } + + ret = refs_fsck_ref(ref_store, o, &report, target_name, &oid); } else { ret = files_fsck_symref_target(ref_store, o, &report, target_name, &referent, 0); From ab67f0a43677b56a9cfe3c6b0f1d8fe0f9a988eb Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:03:00 +0100 Subject: [PATCH 41/50] refs/reftable: adapt includes to become consistent Adapt the includes to be sorted and to use include paths that are relative to the "refs/" directory. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- refs/reftable-backend.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index 4319a4eacbafc4..d61790cf653506 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -10,9 +10,10 @@ #include "../gettext.h" #include "../hash.h" #include "../hex.h" -#include "../iterator.h" #include "../ident.h" +#include "../iterator.h" #include "../object.h" +#include "../parse.h" #include "../path.h" #include "../refs.h" #include "../reftable/reftable-basics.h" @@ -26,7 +27,6 @@ #include "../strmap.h" #include "../trace2.h" #include "../write-or-die.h" -#include "parse.h" #include "refs-internal.h" /* From 78384e2467c4e457f230e731f26409fa7dd72434 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:03:01 +0100 Subject: [PATCH 42/50] refs/reftable: extract function to retrieve backend for worktree Pull out the logic to retrieve a backend for a given worktree. This function will be used in a subsequent commit. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- refs/reftable-backend.c | 70 +++++++++++++++++++++++++---------------- 1 file changed, 43 insertions(+), 27 deletions(-) diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index d61790cf653506..dda961a32b532c 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -172,6 +172,37 @@ static struct reftable_ref_store *reftable_be_downcast(struct ref_store *ref_sto return refs; } +static int backend_for_worktree(struct reftable_backend **out, + struct reftable_ref_store *store, + const char *worktree_name) +{ + struct strbuf worktree_dir = STRBUF_INIT; + int ret; + + *out = strmap_get(&store->worktree_backends, worktree_name); + if (*out) { + ret = 0; + goto out; + } + + strbuf_addf(&worktree_dir, "%s/worktrees/%s/reftable", + store->base.repo->commondir, worktree_name); + + CALLOC_ARRAY(*out, 1); + store->err = ret = reftable_backend_init(*out, worktree_dir.buf, + &store->write_options); + if (ret < 0) { + free(*out); + goto out; + } + + strmap_put(&store->worktree_backends, worktree_name, *out); + +out: + strbuf_release(&worktree_dir); + return ret; +} + /* * Some refs are global to the repository (refs/heads/{*}), while others are * local to the worktree (eg. HEAD, refs/bisect/{*}). We solve this by having @@ -191,19 +222,19 @@ static int backend_for(struct reftable_backend **out, const char **rewritten_ref, int reload) { - struct reftable_backend *be; const char *wtname; int wtname_len; + int ret; if (!refname) { - be = &store->main_backend; + *out = &store->main_backend; + ret = 0; goto out; } switch (parse_worktree_ref(refname, &wtname, &wtname_len, rewritten_ref)) { case REF_WORKTREE_OTHER: { static struct strbuf wtname_buf = STRBUF_INIT; - struct strbuf wt_dir = STRBUF_INIT; /* * We're using a static buffer here so that we don't need to @@ -223,20 +254,8 @@ static int backend_for(struct reftable_backend **out, * already and error out when trying to write a reference via * both stacks. */ - be = strmap_get(&store->worktree_backends, wtname_buf.buf); - if (!be) { - strbuf_addf(&wt_dir, "%s/worktrees/%s/reftable", - store->base.repo->commondir, wtname_buf.buf); + ret = backend_for_worktree(out, store, wtname_buf.buf); - CALLOC_ARRAY(be, 1); - store->err = reftable_backend_init(be, wt_dir.buf, - &store->write_options); - assert(store->err != REFTABLE_API_ERROR); - - strmap_put(&store->worktree_backends, wtname_buf.buf, be); - } - - strbuf_release(&wt_dir); goto out; } case REF_WORKTREE_CURRENT: @@ -245,27 +264,24 @@ static int backend_for(struct reftable_backend **out, * main worktree. We thus return the main stack in that case. */ if (!store->worktree_backend.stack) - be = &store->main_backend; + *out = &store->main_backend; else - be = &store->worktree_backend; + *out = &store->worktree_backend; + ret = 0; goto out; case REF_WORKTREE_MAIN: case REF_WORKTREE_SHARED: - be = &store->main_backend; + *out = &store->main_backend; + ret = 0; goto out; default: BUG("unhandled worktree reference type"); } out: - if (reload) { - int ret = reftable_stack_reload(be->stack); - if (ret) - return ret; - } - *out = be; - - return 0; + if (reload && !ret) + ret = reftable_stack_reload((*out)->stack); + return ret; } static int should_write_log(struct reftable_ref_store *refs, const char *refname) From 9341740bea2ce334be412835fdcc0dd31550071a Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:03:02 +0100 Subject: [PATCH 43/50] refs/reftable: fix consistency checks with worktrees The ref consistency checks are driven via `cmd_refs_verify()`. That function loops through all worktrees (including the main worktree) and then checks the ref store for each of them individually. It follows that the backend is expected to only verify refs that belong to the specified worktree. While the "files" backend handles this correctly, the "reftable" backend doesn't. In fact, it completely ignores the passed worktree and instead verifies refs of _all_ worktrees. The consequence is that we'll end up every ref store N times, where N is the number of worktrees. Or rather, that would be the case if we actually iterated through the worktree reftable stacks correctly. But we use `strmap_for_each_entry()` to iterate through the stacks, but the map is in fact not even properly populated. So instead of checking stacks N^2 times, we actually only end up checking the reftable stack of the main worktree. Fix this bug by only verifying the stack of the passed-in worktree and constructing the backends via `backend_for_worktree()`. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- refs/reftable-backend.c | 29 ++++++++++++++--------------- t/t0614-reftable-fsck.sh | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+), 15 deletions(-) diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index dda961a32b532c..6361b270155e38 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -26,6 +26,7 @@ #include "../setup.h" #include "../strmap.h" #include "../trace2.h" +#include "../worktree.h" #include "../write-or-die.h" #include "refs-internal.h" @@ -2762,25 +2763,23 @@ static int reftable_fsck_error_handler(struct reftable_fsck_info *info, } static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o, - struct worktree *wt UNUSED) + struct worktree *wt) { - struct reftable_ref_store *refs; - struct strmap_entry *entry; - struct hashmap_iter iter; - int ret = 0; - - refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck"); - - ret |= reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler, - reftable_fsck_verbose_handler, o); + struct reftable_ref_store *refs = + reftable_be_downcast(ref_store, REF_STORE_READ, "fsck"); + struct reftable_backend *backend; - strmap_for_each_entry(&refs->worktree_backends, &iter, entry) { - struct reftable_backend *b = (struct reftable_backend *)entry->value; - ret |= reftable_fsck_check(b->stack, reftable_fsck_error_handler, - reftable_fsck_verbose_handler, o); + if (is_main_worktree(wt)) { + backend = &refs->main_backend; + } else { + int ret = backend_for_worktree(&backend, refs, wt->id); + if (ret < 0) + return error(_("reftable stack for worktree '%s' is broken"), + wt->id); } - return ret; + return reftable_fsck_check(backend->stack, reftable_fsck_error_handler, + reftable_fsck_verbose_handler, o); } struct ref_storage_be refs_be_reftable = { diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh index 677eb9143c9de4..4757eb5931dd5a 100755 --- a/t/t0614-reftable-fsck.sh +++ b/t/t0614-reftable-fsck.sh @@ -55,4 +55,36 @@ for TABLE_NAME in "foo-bar-e4d12d59.ref" \ ' done +test_expect_success 'worktree stacks can be verified' ' + test_when_finished "rm -rf repo worktree" && + git init repo && + test_commit -C repo initial && + git -C repo worktree add ../worktree && + + git -C worktree refs verify 2>err && + test_must_be_empty err && + + REFTABLE_DIR=$(git -C worktree rev-parse --git-dir)/reftable && + EXISTING_TABLE=$(head -n1 "$REFTABLE_DIR/tables.list") && + mv "$REFTABLE_DIR/$EXISTING_TABLE" "$REFTABLE_DIR/broken.ref" && + + for d in repo worktree + do + echo "broken.ref" >"$REFTABLE_DIR/tables.list" && + git -C "$d" refs verify 2>err && + cat >expect <<-EOF && + warning: broken.ref: badReftableTableName: invalid reftable table name + EOF + test_cmp expect err && + + echo garbage >"$REFTABLE_DIR/tables.list" && + test_must_fail git -C "$d" refs verify 2>err && + cat >expect <<-EOF && + error: reftable stack for worktree ${SQ}worktree${SQ} is broken + EOF + test_cmp expect err || return 1 + + done +' + test_done From 06d6ead762ba525c5837812e7f509406253cdacd Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:03:03 +0100 Subject: [PATCH 44/50] refs/reftable: introduce generic checks for refs In a preceding commit we have extracted generic checks for both direct and symbolic refs that apply for all backends. Wire up those checks for the "reftable" backend. Note that this is done by iterating through all refs manually with the low-level reftable ref iterator. We explicitly don't want to use the higher-level iterator that is exposed to users of the reftable backend as that iterator may swallow for example broken refs. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- refs/reftable-backend.c | 82 +++++++++++++++++++++++++++++++++++++--- t/t0614-reftable-fsck.sh | 12 ++++++ 2 files changed, 88 insertions(+), 6 deletions(-) diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index 6361b270155e38..fe74af73afdb7a 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -2767,19 +2767,89 @@ static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o, { struct reftable_ref_store *refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck"); + struct reftable_ref_iterator *iter = NULL; + struct reftable_ref_record ref = { 0 }; + struct fsck_ref_report report = { 0 }; + struct strbuf refname = STRBUF_INIT; struct reftable_backend *backend; + int ret, errors = 0; if (is_main_worktree(wt)) { backend = &refs->main_backend; } else { - int ret = backend_for_worktree(&backend, refs, wt->id); - if (ret < 0) - return error(_("reftable stack for worktree '%s' is broken"), - wt->id); + ret = backend_for_worktree(&backend, refs, wt->id); + if (ret < 0) { + ret = error(_("reftable stack for worktree '%s' is broken"), + wt->id); + goto out; + } + } + + errors |= reftable_fsck_check(backend->stack, reftable_fsck_error_handler, + reftable_fsck_verbose_handler, o); + + iter = ref_iterator_for_stack(refs, backend->stack, "", NULL, 0); + if (!iter) { + ret = error(_("could not create iterator for worktree '%s'"), wt->id); + goto out; + } + + while (1) { + ret = reftable_iterator_next_ref(&iter->iter, &ref); + if (ret > 0) + break; + if (ret < 0) { + ret = error(_("could not read record for worktree '%s'"), wt->id); + goto out; + } + + strbuf_reset(&refname); + if (!is_main_worktree(wt)) + strbuf_addf(&refname, "worktrees/%s/", wt->id); + strbuf_addstr(&refname, ref.refname); + report.path = refname.buf; + + switch (ref.value_type) { + case REFTABLE_REF_VAL1: + case REFTABLE_REF_VAL2: { + struct object_id oid; + unsigned hash_id; + + switch (reftable_stack_hash_id(backend->stack)) { + case REFTABLE_HASH_SHA1: + hash_id = GIT_HASH_SHA1; + break; + case REFTABLE_HASH_SHA256: + hash_id = GIT_HASH_SHA256; + break; + default: + BUG("unhandled hash ID %d", + reftable_stack_hash_id(backend->stack)); + } + + oidread(&oid, reftable_ref_record_val1(&ref), + &hash_algos[hash_id]); + + errors |= refs_fsck_ref(ref_store, o, &report, ref.refname, &oid); + break; + } + case REFTABLE_REF_SYMREF: + errors |= refs_fsck_symref(ref_store, o, &report, ref.refname, + ref.value.symref); + break; + default: + BUG("unhandled reference value type %d", ref.value_type); + } } - return reftable_fsck_check(backend->stack, reftable_fsck_error_handler, - reftable_fsck_verbose_handler, o); + ret = errors ? -1 : 0; + +out: + if (iter) + ref_iterator_free(&iter->base); + reftable_ref_record_release(&ref); + strbuf_release(&refname); + return ret; } struct ref_storage_be refs_be_reftable = { diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh index 4757eb5931dd5a..d24b87f9611975 100755 --- a/t/t0614-reftable-fsck.sh +++ b/t/t0614-reftable-fsck.sh @@ -87,4 +87,16 @@ test_expect_success 'worktree stacks can be verified' ' done ' +test_expect_success 'invalid symref gets reported' ' + test_when_finished "rm -rf repo" && + git init repo && + test_commit -C repo initial && + git -C repo symbolic-ref refs/heads/symref garbage && + test_must_fail git -C repo refs verify 2>err && + cat >expect <<-EOF && + error: refs/heads/symref: badReferentName: points to invalid refname ${SQ}garbage${SQ} + EOF + test_cmp expect err +' + test_done From 46d611cadab500ca2b458b2fda7008c41b174011 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:03:04 +0100 Subject: [PATCH 45/50] builtin/fsck: move generic object ID checks into `refs_fsck()` While most of the logic that verifies the consistency of refs is driven by `refs_fsck()`, we still have a small handful of checks in `fsck_head_link()`. These checks don't use the git-fsck(1) reporting infrastructure, and as such it's impossible to for example disable some of those checks. One such check detects refs that point to the all-zeroes object ID. Extract this check into the generic `refs_fsck_ref()` function that is used by both the "files" and "reftable" backends. Note that this will cause us to not return an error code from `fsck_head_link()` anymore in case this error was detected. This is fine though: the only caller of this function does not check the error code anyway. To demonstrate this, adapt the function to drop its return value altogether. The function will be removed in a subsequent commit anyway. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- Documentation/fsck-msgids.adoc | 3 +++ builtin/fsck.c | 41 +++++++++++++--------------------- fsck.h | 1 + refs.c | 11 ++++++--- t/t1450-fsck.sh | 6 ++--- 5 files changed, 30 insertions(+), 32 deletions(-) diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc index acac9683af83f4..76609321f662dd 100644 --- a/Documentation/fsck-msgids.adoc +++ b/Documentation/fsck-msgids.adoc @@ -41,6 +41,9 @@ `badRefName`:: (ERROR) A ref has an invalid format. +`badRefOid`:: + (ERROR) A ref points to an invalid object ID. + `badReferentName`:: (ERROR) The referent name of a symref is invalid. diff --git a/builtin/fsck.c b/builtin/fsck.c index 4979bc795e5d61..4dd4d74d1e15da 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -564,9 +564,9 @@ static int fsck_handle_ref(const struct reference *ref, void *cb_data UNUSED) return 0; } -static int fsck_head_link(const char *head_ref_name, - const char **head_points_at, - struct object_id *head_oid); +static void fsck_head_link(const char *head_ref_name, + const char **head_points_at, + struct object_id *head_oid); static void get_default_heads(void) { @@ -713,12 +713,10 @@ static void fsck_source(struct odb_source *source) stop_progress(&progress); } -static int fsck_head_link(const char *head_ref_name, - const char **head_points_at, - struct object_id *head_oid) +static void fsck_head_link(const char *head_ref_name, + const char **head_points_at, + struct object_id *head_oid) { - int null_is_error = 0; - if (verbose) fprintf_ln(stderr, _("Checking %s link"), head_ref_name); @@ -727,27 +725,18 @@ static int fsck_head_link(const char *head_ref_name, NULL); if (!*head_points_at) { errors_found |= ERROR_REFS; - return error(_("invalid %s"), head_ref_name); + error(_("invalid %s"), head_ref_name); + return; } - if (!strcmp(*head_points_at, head_ref_name)) - /* detached HEAD */ - null_is_error = 1; - else if (!starts_with(*head_points_at, "refs/heads/")) { + if (strcmp(*head_points_at, head_ref_name) && + !starts_with(*head_points_at, "refs/heads/")) { errors_found |= ERROR_REFS; - return error(_("%s points to something strange (%s)"), - head_ref_name, *head_points_at); - } - if (is_null_oid(head_oid)) { - if (null_is_error) { - errors_found |= ERROR_REFS; - return error(_("%s: detached HEAD points at nothing"), - head_ref_name); - } - fprintf_ln(stderr, - _("notice: %s points to an unborn branch (%s)"), - head_ref_name, *head_points_at + 11); + error(_("%s points to something strange (%s)"), + head_ref_name, *head_points_at); + return; } - return 0; + + return; } static int fsck_cache_tree(struct cache_tree *it, const char *index_path) diff --git a/fsck.h b/fsck.h index bfe0d9c6d2e44e..1f472b7daac38c 100644 --- a/fsck.h +++ b/fsck.h @@ -39,6 +39,7 @@ enum fsck_msg_type { FUNC(BAD_REF_CONTENT, ERROR) \ FUNC(BAD_REF_FILETYPE, ERROR) \ FUNC(BAD_REF_NAME, ERROR) \ + FUNC(BAD_REF_OID, ERROR) \ FUNC(BAD_TIMEZONE, ERROR) \ FUNC(BAD_TREE, ERROR) \ FUNC(BAD_TREE_SHA1, ERROR) \ diff --git a/refs.c b/refs.c index 4fc1317cb32323..c3528862c68f63 100644 --- a/refs.c +++ b/refs.c @@ -320,10 +320,15 @@ int check_refname_format(const char *refname, int flags) return check_or_sanitize_refname(refname, flags, NULL); } -int refs_fsck_ref(struct ref_store *refs UNUSED, struct fsck_options *o UNUSED, - struct fsck_ref_report *report UNUSED, - const char *refname UNUSED, const struct object_id *oid UNUSED) +int refs_fsck_ref(struct ref_store *refs UNUSED, struct fsck_options *o, + struct fsck_ref_report *report, + const char *refname UNUSED, const struct object_id *oid) { + if (is_null_oid(oid)) + return fsck_report_ref(o, report, FSCK_MSG_BAD_REF_OID, + "points to invalid object ID '%s'", + oid_to_hex(oid)); + return 0; } diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index c4b651c2dc7938..900c1b2eb258c1 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -105,7 +105,7 @@ test_expect_success REFFILES 'HEAD link pointing at a funny object' ' echo $ZERO_OID >.git/HEAD && # avoid corrupt/broken HEAD from interfering with repo discovery test_must_fail env GIT_DIR=.git git fsck 2>out && - test_grep "detached HEAD points" out + test_grep "HEAD: badRefOid: points to invalid object ID ${SQ}$ZERO_OID${SQ}" out ' test_expect_success 'HEAD link pointing at a funny place' ' @@ -123,7 +123,7 @@ test_expect_success REFFILES 'HEAD link pointing at a funny object (from differe echo $ZERO_OID >.git/HEAD && # avoid corrupt/broken HEAD from interfering with repo discovery test_must_fail git -C wt fsck 2>out && - test_grep "main-worktree/HEAD: detached HEAD points" out + test_grep "HEAD: badRefOid: points to invalid object ID ${SQ}$ZERO_OID${SQ}" out ' test_expect_success REFFILES 'other worktree HEAD link pointing at a funny object' ' @@ -131,7 +131,7 @@ test_expect_success REFFILES 'other worktree HEAD link pointing at a funny objec git worktree add other && echo $ZERO_OID >.git/worktrees/other/HEAD && test_must_fail git fsck 2>out && - test_grep "worktrees/other/HEAD: detached HEAD points" out + test_grep "worktrees/other/HEAD: badRefOid: points to invalid object ID ${SQ}$ZERO_OID${SQ}" out ' test_expect_success 'other worktree HEAD link pointing at missing object' ' From 9727336b31c055e4507248703b8a4a8ed039dc06 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:03:05 +0100 Subject: [PATCH 46/50] builtin/fsck: move generic HEAD check into `refs_fsck()` Move the check that detects "HEAD" refs that do not point at a branch into `refs_fsck()`. This follows the same motivation as the preceding commit. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- Documentation/fsck-msgids.adoc | 3 +++ builtin/fsck.c | 7 ------- fsck.h | 1 + refs.c | 12 +++++++++++- t/t0602-reffiles-fsck.sh | 8 ++++---- t/t1450-fsck.sh | 4 ++-- 6 files changed, 21 insertions(+), 14 deletions(-) diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc index 76609321f662dd..6a4db3a9916ea6 100644 --- a/Documentation/fsck-msgids.adoc +++ b/Documentation/fsck-msgids.adoc @@ -13,6 +13,9 @@ `badGpgsig`:: (ERROR) A tag contains a bad (truncated) signature (e.g., `gpgsig`) header. +`badHeadTarget`:: + (ERROR) The `HEAD` ref is a symref that does not refer to a branch. + `badHeaderContinuation`:: (ERROR) A continuation header (such as for `gpgsig`) is unexpectedly truncated. diff --git a/builtin/fsck.c b/builtin/fsck.c index 4dd4d74d1e15da..5dda441f45f3f5 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -728,13 +728,6 @@ static void fsck_head_link(const char *head_ref_name, error(_("invalid %s"), head_ref_name); return; } - if (strcmp(*head_points_at, head_ref_name) && - !starts_with(*head_points_at, "refs/heads/")) { - errors_found |= ERROR_REFS; - error(_("%s points to something strange (%s)"), - head_ref_name, *head_points_at); - return; - } return; } diff --git a/fsck.h b/fsck.h index 1f472b7daac38c..65ecbb7fe194ab 100644 --- a/fsck.h +++ b/fsck.h @@ -30,6 +30,7 @@ enum fsck_msg_type { FUNC(BAD_DATE_OVERFLOW, ERROR) \ FUNC(BAD_EMAIL, ERROR) \ FUNC(BAD_GPGSIG, ERROR) \ + FUNC(BAD_HEAD_TARGET, ERROR) \ FUNC(BAD_NAME, ERROR) \ FUNC(BAD_OBJECT_SHA1, ERROR) \ FUNC(BAD_PACKED_REF_ENTRY, ERROR) \ diff --git a/refs.c b/refs.c index c3528862c68f63..a772d371cd5938 100644 --- a/refs.c +++ b/refs.c @@ -334,8 +334,18 @@ int refs_fsck_ref(struct ref_store *refs UNUSED, struct fsck_options *o, int refs_fsck_symref(struct ref_store *refs UNUSED, struct fsck_options *o, struct fsck_ref_report *report, - const char *refname UNUSED, const char *target) + const char *refname, const char *target) { + const char *stripped_refname; + + parse_worktree_ref(refname, NULL, NULL, &stripped_refname); + + if (!strcmp(stripped_refname, "HEAD") && + !starts_with(target, "refs/heads/") && + fsck_report_ref(o, report, FSCK_MSG_BAD_HEAD_TARGET, + "HEAD points to non-branch '%s'", target)) + return -1; + if (is_root_ref(target)) return 0; diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh index 479f3d528e6ca8..3c1f553b8120ec 100755 --- a/t/t0602-reffiles-fsck.sh +++ b/t/t0602-reffiles-fsck.sh @@ -910,10 +910,10 @@ test_expect_success 'complains about broken root ref' ' git init repo && ( cd repo && - echo "ref: refs/../HEAD" >.git/HEAD && + echo "ref: refs/heads/../HEAD" >.git/HEAD && test_must_fail git refs verify 2>err && cat >expect <<-EOF && - error: HEAD: badReferentName: points to invalid refname ${SQ}refs/../HEAD${SQ} + error: HEAD: badReferentName: points to invalid refname ${SQ}refs/heads/../HEAD${SQ} EOF test_cmp expect err ) @@ -926,10 +926,10 @@ test_expect_success 'complains about broken root ref in worktree' ' cd repo && test_commit initial && git worktree add ../worktree && - echo "ref: refs/../HEAD" >.git/worktrees/worktree/HEAD && + echo "ref: refs/heads/../HEAD" >.git/worktrees/worktree/HEAD && test_must_fail git refs verify 2>err && cat >expect <<-EOF && - error: worktrees/worktree/HEAD: badReferentName: points to invalid refname ${SQ}refs/../HEAD${SQ} + error: worktrees/worktree/HEAD: badReferentName: points to invalid refname ${SQ}refs/heads/../HEAD${SQ} EOF test_cmp expect err ) diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 900c1b2eb258c1..3fae05f9d9f805 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -113,7 +113,7 @@ test_expect_success 'HEAD link pointing at a funny place' ' test-tool ref-store main create-symref HEAD refs/funny/place && # avoid corrupt/broken HEAD from interfering with repo discovery test_must_fail env GIT_DIR=.git git fsck 2>out && - test_grep "HEAD points to something strange" out + test_grep "HEAD: badHeadTarget: HEAD points to non-branch ${SQ}refs/funny/place${SQ}" out ' test_expect_success REFFILES 'HEAD link pointing at a funny object (from different wt)' ' @@ -148,7 +148,7 @@ test_expect_success 'other worktree HEAD link pointing at a funny place' ' git worktree add other && git -C other symbolic-ref HEAD refs/funny/place && test_must_fail git fsck 2>out && - test_grep "worktrees/other/HEAD points to something strange" out + test_grep "worktrees/other/HEAD: badHeadTarget: HEAD points to non-branch ${SQ}refs/funny/place${SQ}" out ' test_expect_success 'commit with multiple signatures is okay' ' From 8947da018387f146a90e64055b4caf2ab79e39a7 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 12 Jan 2026 10:03:06 +0100 Subject: [PATCH 47/50] builtin/fsck: drop `fsck_head_link()` The function `fsck_head_link()` was historically used to perform a couple of consistency checks for refs. (Almost) all of these checks have now been moved into the refs subsystem. There's only a single check remaining that verifies whether `refs_resolve_ref_unsafe()` returns a `NULL` pointer. This may happen in a couple of cases: - When `refs_is_safe()` declares the ref to be unsafe. We already have checks for this as we verify refnames with `check_refname_format()`. - When the ref doesn't exist. A repository without "HEAD" is completely broken though, and we would notice this error ahead of time already. - In case the caller passes `RESOLVE_REF_READING` and the ref is a symref that doesn't resolve. We don't pass this flag though. As such, this check doesn't cover anything anymore that isn't already covered by `refs_fsck()`. Drop it, which also allows us to inline the call to `refs_resolve_ref_unsafe()`. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- builtin/fsck.c | 28 ++++------------------------ 1 file changed, 4 insertions(+), 24 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 5dda441f45f3f5..f104b7af0e15f3 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -564,10 +564,6 @@ static int fsck_handle_ref(const struct reference *ref, void *cb_data UNUSED) return 0; } -static void fsck_head_link(const char *head_ref_name, - const char **head_points_at, - struct object_id *head_oid); - static void get_default_heads(void) { struct worktree **worktrees, **p; @@ -583,7 +579,10 @@ static void get_default_heads(void) struct strbuf refname = STRBUF_INIT; strbuf_worktree_ref(wt, &refname, "HEAD"); - fsck_head_link(refname.buf, &head_points_at, &head_oid); + + head_points_at = refs_resolve_ref_unsafe(get_main_ref_store(the_repository), + refname.buf, 0, &head_oid, NULL); + if (head_points_at && !is_null_oid(&head_oid)) { struct reference ref = { .name = refname.buf, @@ -713,25 +712,6 @@ static void fsck_source(struct odb_source *source) stop_progress(&progress); } -static void fsck_head_link(const char *head_ref_name, - const char **head_points_at, - struct object_id *head_oid) -{ - if (verbose) - fprintf_ln(stderr, _("Checking %s link"), head_ref_name); - - *head_points_at = refs_resolve_ref_unsafe(get_main_ref_store(the_repository), - head_ref_name, 0, head_oid, - NULL); - if (!*head_points_at) { - errors_found |= ERROR_REFS; - error(_("invalid %s"), head_ref_name); - return; - } - - return; -} - static int fsck_cache_tree(struct cache_tree *it, const char *index_path) { int i; From d0cec08d704a44105545e9e65c831fc67f6f1986 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Torsten=20B=C3=B6gershausen?= Date: Mon, 12 Jan 2026 17:25:51 +0100 Subject: [PATCH 48/50] utf8.c: prepare workaround for iconv under macOS 14/15 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MacOS14 (Sonoma) has started to ship an iconv library with bugs. The same bugs exists even in MacOS 15 (Sequoia) A bug report running the Git test suite says: three tests of t3900 fail on macOS 26.1 for me: not ok 17 - ISO-2022-JP should be shown in UTF-8 now not ok 25 - ISO-2022-JP should be shown in UTF-8 now not ok 38 - commit --fixup into ISO-2022-JP from UTF-8 Here's the verbose output of the first one: ================= expecting success of 3900.17 'ISO-2022-JP should be shown in UTF-8 now': compare_with ISO-2022-JP "$TEST_DIRECTORY"/t3900/2-UTF-8.txt --- /Users/x/src/git/t/t3900/2-UTF-8.txt 2024-10-01 19:43:24.605230684 +0000 +++ current 2025-12-08 21:52:45.786161909 +0000 @@ -1,5 +1,5 @@ はれひほふ しているのが、いるので。 -濱浜ほれぷりぽれまびぐりろへ。 +濱浜ほれぷりぽれまび$0$j$m$X!# not ok 17 - ISO-2022-JP should be shown in UTF-8 now 1..17 ================= compare_with runs git show to display a commit message, which in this case here was encoded using ISO-2022-JP and is supposed to be reencoded to UTF-8, but git show only does that half-way -- the "$0$j$m$X!#" part is from the original ISO-2022-JP representation. That botched conversion is done by utf8.c::reencode_string_iconv(). It calls iconv(3) to do the actual work, initially with an output buffer of the same size as the input. If the output needs more space the function enlarges the buffer and calls iconv(3) again. iconv(3) won't tell us how much space it needs, but it will report what part it already managed to convert, so we can increase the buffer and continue from there. ISO-2022-JP has escape codes for switching between character sets, so it's a stateful encoding. I guess the iconv(3) on my machine forgets the state at the end of part one and then messes up part two. [end of citation] Working around the buggy iconv shipped with the OS can be done in two ways: a) Link Git against a different version of iconv b) Improve the handling when iconv needs a larger output buffer a) is already done by default when either Fink [1] or MacPorts [2] or Homebrew [3] is installed. b) is implemented here, in case that no fixed iconv is available: When the output buffer is too short, increase it (as before) and start from scratch (this is new). This workound needs to be enabled with '#define ICONV_RESTART_RESET' and a makefile knob will be added in the next commit Suggested-by: René Scharfe Signed-off-by: Torsten Bögershausen [1] https://www.finkproject.org/ [2] https://www.macports.org/ [3] https://brew.sh/ Signed-off-by: Torsten Bögershausen Signed-off-by: Junio C Hamano --- utf8.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/utf8.c b/utf8.c index 35a02519392e65..96460cc414348b 100644 --- a/utf8.c +++ b/utf8.c @@ -515,6 +515,19 @@ char *reencode_string_iconv(const char *in, size_t insz, iconv_t conv, out = xrealloc(out, outalloc); outpos = out + sofar; outsz = outalloc - sofar - 1; +#ifdef ICONV_RESTART_RESET + /* + * If iconv(3) messes up piecemeal conversions + * then restore the original pointers, sizes, + * and converter state, then retry converting + * the full string using the reallocated buffer. + */ + insz += cp - (iconv_ibp)in; /* Restore insz */ + cp = (iconv_ibp)in; /* original start value */ + outpos = out + bom_len; /* original start value */ + outsz = outalloc - bom_len - 1; /* new len */ + iconv(conv, NULL, NULL, NULL, NULL); /* reset iconv machinery */ +#endif } else { *outpos = '\0'; From d28124151851e42a3bb92963f5b747ad843f33e0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Torsten=20B=C3=B6gershausen?= Date: Mon, 12 Jan 2026 17:25:53 +0100 Subject: [PATCH 49/50] utf8.c: enable workaround for iconv under macOS 14/15 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous commit introduced a workaround in utf8.c to deal with broken iconv implementations. It is enabled when a MacOS version is used that has a buggy iconv library and there is no external library provided (and linked against) from neither MacPorts nor Homebrew nor Fink. For Homebrew, MacPorts and Fink we check if libiconv exist. Introduce 2 new macros: HAS_GOOD_LIBICONV and NEEDS_GOOD_LIBICONV. For Homebrew HAS_GOOD_LIBICONV is set when the libiconv directory exist. MacPorts can be installed with or without libiconv, so check if libiconv.dylib exists (which is a softlink) Fink compiles and installs libiconv by default. Note that a fresh installation of Fink now defaults to /opt/sw. Older versions used /sw as default, so leave the check and setting of BASIC_CFLAGS and BASIC_LDFLAGS as is. For the new default check for the existance of /opt/sw as well. Add a check for /opt/sw/lib/libiconv.dylib which sets HAS_GOOD_LIBICONV Signed-off-by: Torsten Bögershausen Signed-off-by: Junio C Hamano --- Makefile | 16 ++++++++++++++++ config.mak.uname | 1 + 2 files changed, 17 insertions(+) diff --git a/Makefile b/Makefile index b7eba509c6a0ca..8aa489f3b6812f 100644 --- a/Makefile +++ b/Makefile @@ -1687,11 +1687,21 @@ ifeq ($(uname_S),Darwin) BASIC_CFLAGS += -I/sw/include BASIC_LDFLAGS += -L/sw/lib endif + ifeq ($(shell test -d /opt/sw/lib && echo y),y) + BASIC_CFLAGS += -I/opt/sw/include + BASIC_LDFLAGS += -L/opt/sw/lib + ifeq ($(shell test -e /opt/sw/lib/libiconv.dylib && echo y),y) + HAS_GOOD_LIBICONV = Yes + endif + endif endif ifndef NO_DARWIN_PORTS ifeq ($(shell test -d /opt/local/lib && echo y),y) BASIC_CFLAGS += -I/opt/local/include BASIC_LDFLAGS += -L/opt/local/lib + ifeq ($(shell test -e /opt/local/lib/libiconv.dylib && echo y),y) + HAS_GOOD_LIBICONV = Yes + endif endif endif ifndef NO_APPLE_COMMON_CRYPTO @@ -1714,6 +1724,7 @@ endif ifdef USE_HOMEBREW_LIBICONV ifeq ($(shell test -d $(HOMEBREW_PREFIX)/opt/libiconv && echo y),y) ICONVDIR ?= $(HOMEBREW_PREFIX)/opt/libiconv + HAS_GOOD_LIBICONV = Yes endif endif endif @@ -1859,6 +1870,11 @@ ifndef NO_ICONV endif EXTLIBS += $(ICONV_LINK) -liconv endif + ifdef NEEDS_GOOD_LIBICONV + ifndef HAS_GOOD_LIBICONV + BASIC_CFLAGS += -DICONV_RESTART_RESET + endif + endif endif ifdef ICONV_OMITS_BOM BASIC_CFLAGS += -DICONV_OMITS_BOM diff --git a/config.mak.uname b/config.mak.uname index 38b35af366d5fd..3c35ae33a3c0c0 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -157,6 +157,7 @@ ifeq ($(uname_S),Darwin) endif ifeq ($(shell test "$(DARWIN_MAJOR_VERSION)" -ge 24 && echo 1),1) USE_HOMEBREW_LIBICONV = UnfortunatelyYes + NEEDS_GOOD_LIBICONV = UnfortunatelyYes endif # The builtin FSMonitor on MacOS builds upon Simple-IPC. Both require From 83a69f19359e6d9bc980563caca38b2b5729808c Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Tue, 20 Jan 2026 15:22:31 -0800 Subject: [PATCH 50/50] Git 2.53-rc1 Signed-off-by: Junio C Hamano --- Documentation/RelNotes/2.53.0.adoc | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/Documentation/RelNotes/2.53.0.adoc b/Documentation/RelNotes/2.53.0.adoc index 039f613f12a821..7d9ca507f1730b 100644 --- a/Documentation/RelNotes/2.53.0.adoc +++ b/Documentation/RelNotes/2.53.0.adoc @@ -38,6 +38,13 @@ UI, Workflows & Features `--onto` option of "git replay". Test coverage of "git replay" has been improved. + * The split command in "git subtree" (in contrib/) has been taught to + deal better with rebased history. + + * The iconv library on macOS fails to correctly handle stateful + ISO/IEC 2022 encoded strings. Work it around instead of replacing + it wholesale from homebrew. + Performance, Internal Implementation, Development Support etc. -------------------------------------------------------------- @@ -96,6 +103,13 @@ Performance, Internal Implementation, Development Support etc. * Import newer version of "clar", unit testing framework. (merge 84071a6dea ps/clar-integers later to maint). + * The packfile_store data structure is moved from object store to odb + source. + + * The object-info API has been cleaned up. + + * Further preparation to upstream symbolic link support on Windows. + Fixes since v2.52 ----------------- @@ -259,6 +273,13 @@ Fixes since v2.52 "git stash export/import" recently introduced. (merge 02fc44a989 bc/doc-stash-import-export later to maint). + * "git fsck" used inconsistent set of refs to show a confused + warning, which has been corrected. + + * Some error messages from the http transport layer lacked the + terminating newline, which has been corrected. + (merge a8227ae8d5 kt/http-backend-errors later to maint). + * Other code cleanup, docfix, build fix, etc. (merge 46207a54cc qj/doc-http-bad-want-response later to maint). (merge df90eccd93 kh/doc-commit-extra-references later to maint). @@ -288,3 +309,5 @@ Fixes since v2.52 (merge 6c5c7e7071 ac/t1420-use-more-direct-check later to maint). (merge 2ac93bfcbc ds/builtin-doc-update later to maint). (merge 3f051fc9c9 kh/doc-patch-id later to maint). + (merge 555c8464e5 je/doc-reset later to maint). + (merge 220f888d7e ps/t1410-cleanup later to maint).