Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seen #4

Open
wants to merge 257 commits into
base: master
Choose a base branch
from
Open

Seen #4

wants to merge 257 commits into from

Conversation

dyrone
Copy link
Owner

@dyrone dyrone commented Aug 17, 2023

Thanks for taking the time to contribute to Git! Please be advised that the
Git community does not use github.com for their contributions. Instead, we use
a mailing list ([email protected]) for code submissions, code reviews, and
bug reports. Nevertheless, you can use GitGitGadget (https://gitgitgadget.github.io/)
to conveniently send your Pull Requests commits to our mailing list.

Please read the "guidelines for contributing" linked above!

avar and others added 30 commits June 8, 2021 09:55
Since the "test-tool read-cache" was originally added back in
1ecb5ff (read-cache: add simple performance test, 2013-06-09) it's
been growing all sorts of bells and whistles that aren't very
conducive to performance testing the index, e.g. it learned how to
read config.

Then in recent changes in e2df6c3 (test-read-cache: print cache
entries with --table, 2021-03-30) and 2782db3 (test-tool: don't
force full index, 2021-03-30) we gained even more logic to deal with
sparse index testing.

I think that having one test tool do so many different things makes it
harder to read its code. Let's instead split up the "again" and "perf"
uses for it into their own tools.

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Change the newly added (but then mostly copy/pasted) read-cache-perf
to use the parse_options() API. This will make things easier as we add
new options.

Since we check the "cnt = < 1" case now via more idiomatic
post-parse_options() assertions we can move from the for-loop to a
while-loop and ditch the "i" variable.

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Change the newly added (but then mostly copy/pasted) read-cache-perf
to use the parse_options() API. I have no plans to further modify
read-cache-again, but making these commands consistent has a value in
and of itself.

Since we check the "cnt = < 1" case now via more idiomatic
post-parse_options() assertions we can move from the for-loop to a
while-loop and ditch the "i" variable.

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Add a perf test for the refresh_index() function to compliment the
existing read()/discard() in a loop perf test added in
1ecb5ff (read-cache: add simple performance test, 2013-06-09).

Since this test is much slower (around 10x) than the previous
read()/discard() test let's run it 100 times instead of the 1000 time
the first one runs.

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Move the is_repository_shallow() added in b790e0f (upload-pack:
send shallow info over stdin to pack-objects, 2014-03-11) to above
setup_revisions().

Running is_repository_shallow() before setup_revisions() doesn't
matter now, but in subsequent commits we'll make the code that
followed setup_revisions() happen inside a callback in that
function. This isolated change documents that re-arranging this part
of the code is OK in isolation.

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Current implementation of `xdg_config_home(filename)` returns
`$XDG_CONFIG_HOME/git/$filename`, with the `git` subdirectory inserted
between the `XDG_CONFIG_HOME` environment variable and the parameter.

This patch introduces a `xdg_config_home_for(subdir, filename)` function
which is more generic. It only concatenates "$XDG_CONFIG_HOME", or
"$HOME/.config" if the former isn’t defined, with the parameters,
without adding `git` in between.

`xdg_config_home(filename)` is now implemented by calling
`xdg_config_home_for("git", filename)` but this new generic function can
be used to compute the configuration directory of other programs.

Signed-off-by: Lénaïc Huard <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Since the inspect() helper in the submodule-basic test suite was
written, 'git -C <dir>' was added. By using -C, we no longer need a
reference to the base directory for the test. This simplifies callsites,
and will make the addition of other arguments in later patches more
readable.

Signed-off-by: Emily Shaffer <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
"zdiff3" is identical to ordinary diff3 except that it allows compaction
of common lines on the two sides of history at the beginning or end of
the conflict hunk.  For example, the following diff3 conflict:

    1
    2
    3
    4
    <<<<<<
    A
    B
    C
    D
    E
    ||||||
    5
    6
    ======
    A
    X
    C
    Y
    E
    >>>>>>
    7
    8
    9

has common lines 'A', 'C', and 'E' on the two sides.  With zdiff3, one
would instead get the following conflict:

    1
    2
    3
    4
    A
    <<<<<<
    B
    C
    D
    ||||||
    5
    6
    ======
    X
    C
    Y
    >>>>>>
    E
    7
    8
    9

Note that the common lines, 'A', and 'E' were moved outside the
conflict.  Unlike with the two-way conflicts from the 'merge'
conflictStyle, the zdiff3 conflict is NOT split into multiple conflict
regions to allow the common 'C' lines to be shown outside a conflict,
because zdiff3 shows the base version too and the base version cannot be
reasonably split.

Initial-patch-by: Uwe Kleine-König <[email protected]>
Signed-off-by: Elijah Newren <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Signed-off-by: Elijah Newren <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Teach submodules a reference to their superproject's gitdir. This allows
us to A) know that we're running from a submodule, and B) have a
shortcut to the superproject's vitals, for example, configs.

By using a relative path instead of an absolute path, we can move the
superproject directory around on the filesystem without breaking the
submodule's cache.

Since this cached value is only introduced during new submodule creation
via `git submodule add`, though, there is more work to do to allow the
cache to be created at other times.

Signed-off-by: Emily Shaffer <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Already during 'git submodule add' we cache a pointer to the
superproject's gitdir. However, this doesn't help brand-new
submodules created with 'git init' and later absorbed with 'git
submodule absorbgitdir'. Let's start adding that pointer during 'git
submodule absorbgitdir' too.

Signed-off-by: Emily Shaffer <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
A cached path to the superproject's gitdir might be added during 'git
submodule add', but in some cases - like submodules which were created
before 'git submodule add' learned to cache that info - it might be
useful to update the cache. Let's do it during 'git submodule update',
when we already have a handle to the superproject while calling
operations on the submodules.

Signed-off-by: Emily Shaffer <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Depending on the system, different schedulers can be used to schedule
the hourly, daily and weekly executions of `git maintenance run`:
* `launchctl` for MacOS,
* `schtasks` for Windows and
* `crontab` for everything else.

`git maintenance run` now has an option to let the end-user explicitly
choose which scheduler he wants to use:
`--scheduler=auto|crontab|launchctl|schtasks`.

When `git maintenance start --scheduler=XXX` is run, it not only
registers `git maintenance run` tasks in the scheduler XXX, it also
removes the `git maintenance run` tasks from all the other schedulers to
ensure we cannot have two schedulers launching concurrent identical
tasks.

The default value is `auto` which chooses a suitable scheduler for the
system.

`git maintenance stop` doesn't have any `--scheduler` parameter because
this command will try to remove the `git maintenance run` tasks from all
the available schedulers.

Signed-off-by: Lénaïc Huard <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
The existing mechanism for scheduling background maintenance is done
through cron. On Linux systems managed by systemd, systemd provides an
alternative to schedule recurring tasks: systemd timers.

The main motivations to implement systemd timers in addition to cron
are:
* cron is optional and Linux systems running systemd might not have it
  installed.
* The execution of `crontab -l` can tell us if cron is installed but not
  if the daemon is actually running.
* With systemd, each service is run in its own cgroup and its logs are
  tagged by the service inside journald. With cron, all scheduled tasks
  are running in the cron daemon cgroup and all the logs of the
  user-scheduled tasks are pretended to belong to the system cron
  service.
  Concretely, a user that doesn’t have access to the system logs won’t
  have access to the log of their own tasks scheduled by cron whereas
  they will have access to the log of their own tasks scheduled by
  systemd timer.
  Although `cron` attempts to send email, that email may go unseen by
  the user because these days, local mailboxes are not heavily used
  anymore.

In order to schedule git maintenance, we need two unit template files:
* ~/.config/systemd/user/[email protected]
  to define the command to be started by systemd and
* ~/.config/systemd/user/[email protected]
  to define the schedule at which the command should be run.

Those units are templates that are parameterized by the frequency.

Based on those templates, 3 timers are started:
* [email protected]
* [email protected]
* [email protected]

The command launched by those three timers are the same as with the
other scheduling methods:

/path/to/git for-each-repo --exec-path=/path/to
--config=maintenance.repo maintenance run --schedule=%i

with the full path for git to ensure that the version of git launched
for the scheduled maintenance is the same as the one used to run
`maintenance start`.

The timer unit contains `Persistent=true` so that, if the computer is
powered down when a maintenance task should run, the task will be run
when the computer is back powered on.

Signed-off-by: Lénaïc Huard <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
When the user recursively clones a repository with submodules and one or
more of those submodules is marked with the submodule.<name>.update=none
configuration, the submodule will end up being active.  This is a
problem because we will have skipped cloning or checking out the
submodule, and as a result, other commands, such as git reset or git
checkout, will fail if they are invoked with --recurse-submodules (or
when submodule.recurse is true).

This is obviously not the behavior the user wanted, so let's fix this by
specifically setting the submodule as inactive in this case when we're
initializing the repository.  That will make us properly ignore the
submodule when performing recursive operations.

We only do this when initializing a submodule, since git submodule
update can update the submodule with various options despite the setting
of "none" and we want those options to override it as they currently do.

Reported-by: Rose Kunkel <[email protected]>
Helped-by: Philippe Blain <[email protected]>
Signed-off-by: brian m. carlson <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
branch_get("HEAD") can return NULL, when HEAD is detached, and cause
the code to segfault when the user calls "git pull --set-upstream".

Catch this case and warn the user to avoid a segfault.

Signed-off-by: Clemens Fruhwirth <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Change the two "disable_stdin" and "read_from_stdin" flags to an enum,
in preparation for a subsequent commit adding more flags.

The interaction between these is more subtle than they might appear at
first sight, as noted in a12cbe2. "read_stdin" is not the inverse
of "disable_stdin", rather we read stdin if we see the "--stdin"
option.

The "read" is intended to understood as "I've read it already", not
"you should read it". Let's avoid this confusion by using "consume"
and "consumed" instead, i.e. a word whose present and past tense isn't
the same.

See 8b3dce5 (Teach --stdin option to "log" family, 2009-11-03)
where we added the "disable_stdin" flag, and a12cbe2 (rev-list:
make empty --stdin not an error, 2018-08-22) for the addition of the
"read_from_stdin" flag.

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Extend the rev_info stdin parsing API to support hooking into its
read_revisions_from_stdin() function, in the next commit we'll change
the the custom stdin parsing in pack-objects.c to use it..

For that use-case adding API is barely justified. We'll be able to
make the handle_revision_arg() static in exchange for it, and we'll
avoid the duplicate dance around setting "save_warning" and
"warn_on_object_refname_ambiguity", but we could just continue to do
that ourselves in builtin/pack-objects.c

The real reason to add this is for a change not part of this
series. We'll soon teach "git bundle create" to accept
revision/refname pairs on stdin, and thus do away with the limitation
of it being impossible to create bundles with ref tips that don't
correspond to the state of the current repository. I.e. this will
work:

    $ printf "e83c5163316f89bfbde7\trefs/heads/first-git-dot-git-commit\n" |
    git bundle create initial.bundle --stdin

As well as things like:

    $ git for-each-ref 'refs/remotes/origin' |
    sed 's!\trefs/remotes/origin/!\trefs/heads/!' |
    git bundle create origin.bundle --stdin

In order to do that we'll need to modify the lines we consume on stdin
revision.c (which bundle.c uses already), and be assured that that
stripping extra bundle-specific data from them is the only change in
behavior.

That change will be much more complex if bundle.c needs to start doing
its own stdin parsing again outside of revision.c, it was recently
converted to use revision.c's parsing in 5bb0fd2 (bundle:
arguments can be read from stdin, 2021-01-11)

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Use the new "handle_stdin_line" API in revision.c to parse stdin in
pack-objects.c, instead of using custom pack-objects.c-specific code
to do so.

This means that we can remove the "if (len && line[len - 1] == '\n')"
check, it's now redundant to using strbuf_getline(), and we get to
skip the whole "warn_on_object_refname_ambiguity" dance. The
read_revisions_from_stdin() function in revision.c we're now using
does it for us.

The pack-objects.c code being refactored away here was first added in
Linus's c323ac7 (git-pack-objects: create a packed object
representation., 2005-06-25).

Later on rev-list started doing similar parsing in 42cabc3 (Teach
rev-list an option to read revs from the standard input., 2006-09-05).
That code was promoted to a more general API in 1fc561d (Move
read_revisions_from_stdin from builtin-rev-list.c to revision.c,
2008-07-05).

Since then the API in revision.c has received improvements that have
been missed here. E.g. the arbitrary limit of 1000 bytes was removed
in 63d564b (read_revision_from_stdin(): use strbuf, 2009-11-20),
and it moved to a more simpler strbuf API in 6e8d46f (revision:
read --stdin with strbuf_getline(), 2015-10-28).

For now we've just made setup_revisions() loop over stdin for us, but
the callback we define makes no use of REV_INFO_STDIN_LINE_PROCESS. We
still need to call handle_revision_arg() ourselves because we'd like
to call it with different flags.

This very light use of the API will be further refined in a subsequent
commit, for now we're just doing the bare minimum to move this
existing code over to the new callback pattern without any functional
changes, and making it as friendly to "git show -w" and "the
--color-moved-ws=allow-indentation-change" mode as possible.

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Continue the work started in the preceding commit of porting
pack-objects.c over to the new handle_stdin_line callback pattern. The
common case for the users of this API is to do some of their own
parsing or munging, and then have handle_revision_arg() handle the
rest.

The existing users of the --stdin parsing always wanted a flag of "0"
to be passed to handle_revision_arg(), but pack-objects.c wants to set
custom flags. Let's support this common case by having a
"revarg_flags" member in the "rev_info" struct.

This allows us to return REV_INFO_STDIN_LINE_PROCESS in the new
get_object_list_handle_stdin_line() instead of
REV_INFO_STDIN_LINE_CONTINUE, as read_revisions_from_stdin() will now
pass down the right flag for us.

I considered making the "--not" parsing be another flag handled by the
revision.c API itself, but since there's only one caller who wants
this, and the "write_bitmap_index = 0" case is more specific than
marking things UNINTERESTING I think it's better to handle it with a
more general mechanism.

These changes means that we can make the handle_revision_arg()
function static. Now that the only external user of the API has been
migrated over to the callback mechanism nothing external to revision.c
needs to call handle_revision_arg() anymore.

That handle_revision_arg() function was made public in a combination
of 5d6f093 (revision.c: allow injecting revision parameters after
setup_revisions()., 2006-09-05) and b5d97e6 (pack-objects: run
rev-list equivalent internally., 2006-09-04).

This change leaves the briefly-used in preceding commits
"void *stdin_line_priv" without any in-tree user, as
builtin/pack-objects.c could be ported over to our new "revarg_flags"
common case.

I'm leaving that "void *stdin_line_priv" in place anyway for two
reasons:

 1. It's a common pattern to allow such a "void *" to be used for
    callback data, so we might as well follow that pattern here in
    anticipation of a future in-tree user.

 2. I have patches for such an in-tree user already in a series
    that'll be submitted after this one. See the reference to "git
    bundle create --stdin" in the commit that added the "handle_stdin_line"
    API.

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We should instead simply use something like this test_create_repo
pattern. It's both less verbose, and makes things easier to debug as a
failing test can have their state left behind under -d without
damaging the state for other tests.

But let's punt on that general refactoring and just change this one
test, I'm going to change it further in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.

This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Test for what happens when the -t and -s flags are asked to operate on
a missing object, this extends tests added in 3e370f9 (t1006: add
tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
-s flags are the only ones that can be combined with
--allow-unknown-type, so let's test with and without that flag.

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Fix a blindspot in the tests for the --allow-unknown-type feature
added in 39e4ae3 (cat-file: teach cat-file a
'--allow-unknown-type' option, 2015-05-03). We should check that
--allow-unknown-type isn't on by default.

Before this change all the tests would succeed if --allow-unknown-type
was on by default, let's fix that by asserting that -t and -s die on a
"garbage" type without --allow-unknown-type.

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Fix a blindspot in the tests for the "rev-list --disk-usage" feature
added in 16950f8 (rev-list: add --disk-usage option for
calculating disk usage, 2021-02-09) to test for what happens when it's
asked to calculate the disk usage of invalid object types.

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.

1. https://lore.kernel.org/git/[email protected]/

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Move the declaration of some ancient object functions added in
e.g. c448357 (Add "unpack_sha1_header()" helper function,
2005-06-01) from cache.h to object-store.h. This continues work
started in cbd53a2 (object-store: move object access functions to
object-store.h, 2018-05-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.

That we set this at all is a relic from the past. Before
052fe5e (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.

Then in a combination of 46f0344 (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.

Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.

This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.

Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.

Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).

Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
gitster added 24 commits August 10, 2021 15:43
* jt/grep-wo-submodule-odb-as-alternate:
  t7814: show lack of alternate ODB-adding
  grep: add repository to OID grep sources
  grep: allocate subrepos on heap
  grep: read submodule entry with explicit repo
  grep: typesafe versions of grep_source_init
  grep: use submodule-ODB-as-alternate lazy-addition
  submodule: lazily add submodule ODBs as alternates
"git maintenance" scheduler learned to use systemd timers as a
possible backend.

* lh/systemd-timers:
  maintenance: add support for systemd timers on Linux
  maintenance: `git maintenance run` learned `--scheduler=<scheduler>`
  cache.h: Introduce a generic "xdg_config_home_for(…)" function
Large part of "git submodule add" gets rewritten in C.

* ar/submodule-add-config:
  submodule--helper: introduce add-config subcommand
More parts of "git submoudle add" has been rewritten in C.

* ar/submodule-add-more:
  submodule--helper: rename compute_submodule_clone_url()
  submodule--helper: remove resolve-relative-url subcommand
  submodule--helper: remove add-config subcommand
  submodule--helper: remove add-clone subcommand
  submodule--helper: convert the bulk of cmd_add() to C
  dir: libify and export helper functions from clone.c
  submodule--helper: remove repeated code in sync_submodule()
  submodule--helper: refactor resolve_relative_url() helper
  submodule--helper: add options for compute_submodule_clone_url()
Usability update for inactive submodules.

Comments?
cf. <[email protected]>
cf. <[email protected]>

* bc/inactive-submodules:
  submodule: mark submodules with update=none as inactive
Tie-break branches that point at the same object in the list of
branches on GitWeb to show the one pointed at by HEAD early.

* gh/gitweb-branch-sort:
  gitweb: use HEAD as secondary sort key in git_get_heads_list()
"git p4" in Python-2 days used to accept a lot more kinds of data
from Perforce server as uninterrupted byte sequence, but after
switching to Python-3, too many things are expected to be in UTF-8,
which broke traditional use cases.

* ao/p4-avoid-decoding:
  git-p4: do not decode data from perforce by default
  git-p4: avoid decoding more data from perforce
Test code shuffling.

* ab/test-tool-cache-cleanup:
  read-cache perf: add a perf test for refresh_index()
  test-tool: migrate read-cache-again to parse_options()
  test-tool: migrate read-cache-perf to parse_options()
  test-tool: split up test-tool read-cache
Introduce handle_stdin_line callback to revision API and uses it.

* ab/pack-objects-stdin:
  pack-objects.c: make use of REV_INFO_STDIN_LINE_PROCESS
  pack-objects.c: do stdin parsing via revision.c's API
  revision.[ch]: add a "handle_stdin_line" API
  revision.h: refactor "disable_stdin" and "read_from_stdin"
  upload-pack: run is_repository_shallow() before setup_revisions()
"Zealous diff3" style of merge conflict presentation has been added.

* en/zdiff3:
  update documentation for new zdiff3 conflictStyle
  xdiff: implement a zealous diff3, or "zdiff3"
A configuration variable in a submodule points at the location of
the superproject it is bound to (RFC).

* es/superproject-aware-submodules:
  SQUASH???
  submodule: cache superproject gitdir during 'update'
  submodule: cache superproject gitdir during absorbgitdirs
  introduce submodule.superprojectGitDir cache
  t7400-submodule-basic: modernize inspect() helper
Code clean-up around "git serve".

* ab/serve-cleanup:
  upload-pack: document and rename --advertise-refs
  serve.[ch]: remove "serve_options", split up --advertise-refs code
  {upload,receive}-pack tests: add --advertise-refs tests
  serve.c: move version line to advertise_capabilities()
  serve: move transfer.advertiseSID check into session_id_advertise()
  serve.[ch]: don't pass "struct strvec *keys" to commands
  serve: use designated initializers
  transport: use designated initializers
  transport: rename "fetch" in transport_vtable to "fetch_refs"
  serve: mark has_capability() as static
Restructuring of (a subset of) Emily's config-based-hooks series,
to demonstrate that a series can be presented as a more logical and
focused progression.

* ab/config-based-hooks-base: (36 commits)
  hooks: fix a TOCTOU in "did we run a hook?" heuristic
  receive-pack: convert receive hooks to hook.h
  post-update: use hook.h library
  receive-pack: convert 'update' hook to hook.h
  hooks: allow callers to capture output
  run-command: allow capturing of collated output
  reference-transaction: use hook.h to run hooks
  hook tests: use a modern style for "pre-push" tests
  hook tests: test for exact "pre-push" hook input
  transport: convert pre-push hook to hook.h
  hook: convert 'post-rewrite' hook in sequencer.c to hook.h
  hook: provide stdin by string_list or callback
  run-command: add stdin callback for parallelization
  am: convert 'post-rewrite' hook to hook.h
  hook: support passing stdin to hooks
  run-command: allow stdin for run_processes_parallel
  run-command: remove old run_hook_{le,ve}() hook API
  receive-pack: convert push-to-checkout hook to hook.h
  read-cache: convert post-index-change to use hook.h
  commit: convert {pre-commit,prepare-commit-msg} hook to hook.h
  ...
"git fsck" has been taught to report mismatch between expected and
actual types of an object better.

* ab/fsck-unexpected-type: (21 commits)
  fsck: report invalid object type-path combinations
  fsck: report invalid types recorded in objects
  object-store.h: move read_loose_object() below 'struct object_info'
  fsck: don't hard die on invalid object types
  object-file.c: return -2 on "header too long" in unpack_loose_header()
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: guard against future bugs in loose_object_info()
  object-file.c: stop dying in parse_loose_header()
  object-file.c: split up ternary in parse_loose_header()
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: add missing braces to loose_object_info()
  object-file.c: make parse_loose_header_extended() public
  object-file.c: don't set "typep" when returning non-zero
  cache.h: move object functions to object-store.h
  cat-file tests: test for current --allow-unknown-type behavior
  cat-file tests: add corrupt loose object test
  rev-list tests: test for behavior with invalid object types
  cat-file tests: test that --allow-unknown-type isn't on by default
  cat-file tests: test for missing object with -t and -s
  fsck tests: add test for fsck-ing an unknown type
  ...
Build clean-up for "make tags" and friends.

* ab/make-tags-cleanup:
  Makefile: normalize clobbering & xargs for tags targets
  Makefile: remove "cscope.out", not "cscope*" in cscope.out target
  Makefile: don't use "FORCE" for tags targets
  Makefile: add QUIET_GEN to "cscope" target
  Makefile: move ".PHONY: cscope" near its target
The reachability bitmap file used to be generated only for a single
pack, but now we've learned to generate bitmaps for history that
span across multiple packfiles.

Comments?

* tb/multi-pack-bitmaps: (25 commits)
  p5326: perf tests for MIDX bitmaps
  p5310: extract full and partial bitmap tests
  midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  t7700: update to work with MIDX bitmap test knob
  t5319: don't write MIDX bitmaps in t5319
  t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  t5326: test multi-pack bitmap behavior
  t/helper/test-read-midx.c: add --checksum mode
  t5310: move some tests to lib-bitmap.sh
  pack-bitmap: write multi-pack bitmaps
  pack-bitmap: read multi-pack bitmaps
  pack-bitmap.c: avoid redundant calls to try_partial_reuse
  pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  pack-bitmap.c: introduce 'bitmap_num_objects()'
  midx: avoid opening multiple MIDXs when writing
  midx: close linked MIDXs, avoid leaking memory
  midx: infer preferred pack when not given one
  midx: reject empty `--preferred-pack`'s
  ...
"git fetch --set-upstream" while on detached HEAD segfaulted
instead of noticing that such an operation did not make sense.

* cf/fetch-set-upstream-while-detached:
  fetch: fix segfault on --set-upstream while on a detached HEAD
Updates to the tests in t0000 to test the test framework.

* ab/lib-subtest:
  test-lib tests: assert 1 exit code, not non-zero
  test-lib tests: refactor common part of check_sub_test_lib_test*()
  test-lib tests: avoid subshell for "test_cmp" for readability
  test-lib tests: assert no copy/pasted mock test code
  test-lib tests: get rid of copy/pasted mock test code
  test-lib tests: don't provide a description for the sub-tests
  test-lib tests: stop using a subshell in write_sub_test_lib_test()
  test-lib tests: split up "write and run" into two functions
  test-lib tests: move "run_sub_test" to a new lib-subtest.sh
Further tweaks on progress API.

* ab/only-single-progress-at-once:
  progress.c: add & assert a "global_progress" variable
  pack-bitmap-write.c: add a missing stop_progress()
  progress.c: add temporary variable from progress struct
  progress.c: stop eagerly fflush(stderr) when not a terminal
  progress.c: call progress_interval() from progress_test_force_update()
  progress.c: move signal handler functions lower
  progress.c tests: test some invalid usage
  progress.c tests: make start/stop verbs on stdin
Use ssh public crypto for object and push-cert signing.

Comments?

* fs/ssh-signing:
  ssh signing: test that gpg fails for unkown keys
  ssh signing: tests for logs, tags & push certs
  ssh signing: duplicate t7510 tests for commits
  ssh signing: verify signatures using ssh-keygen
  ssh signing: provide a textual signing_key_id
  ssh signing: retrieve a default key from ssh-agent
  ssh signing: add ssh key format and signing code
  ssh signing: add test prereqs
  ssh signing: preliminary refactoring and clean-up
Build update.

* cb/makefile-apple-clang:
  build: catch clang that identifies itself as "$VENDOR clang"
  build: clang version may not be followed by extra words
  build: update detect-compiler for newer Xcode version
* ds/sparse-index-ignored-files:
  sparse-checkout: clear tracked sparse dirs
  sparse-index: add SPARSE_INDEX_IGNORE_CONFIG flag
  attr: be careful about sparse directories
  sparse-checkout: create helper methods
  unpack-trees: fix nested sparse-dir search
  sparse-index: silently return when cache tree fails
  sparse-index: silently return when not using cone-mode patterns
  t7519: rewrite sparse index test
When the file named by blame.ignoreRevsFile configuration variable
does not exist, it causes "git blame" to die.  Sometimes, it is
useful if the files named by configuration variables can be made
optional (it would allow ~/.gitconfig to give a filename, and make
it effective only in repositories that have a file with that name).

This uses an extra variable that marks that the variable is
optional.  Yet another alternative is being discussed to define
syntax for "optional" filename values, so that the same mechanism
can be used for not just blame.ignoreRevsFile but other filenames.

Queued just as a reminder of the theme.

* np/blame-ignore-revs-file-may-be-optional:
  blame: add config `blame.ignoreRevsFileIsOptional`
"git upload-pack" which runs on the other side of "git fetch"
forgot to take the ref namespaces into account when handling
want-ref requests.

* ka/want-ref-in-namespace:
  docs: clarify the interaction of transfer.hideRefs and namespaces
  upload-pack.c: treat want-ref relative to namespace
  t5730: introduce fetch command helper
dyrone pushed a commit that referenced this pull request Nov 1, 2023
When t5583-push-branches.sh was originally introduced via 425b4d7
(push: introduce '--branches' option, 2023-05-06), it was not leak-free.
In fact, the test did not even run correctly until 022fbb6 (t5583:
fix shebang line, 2023-05-12), but after applying that patch, we see a
failure at t5583.8:

    ==2529087==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 384 byte(s) in 1 object(s) allocated from:
        #0 0x7fb536330986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
        #1 0x55e07606cbf9 in xrealloc wrapper.c:140
        #2 0x55e075fb6cb3 in prio_queue_put prio-queue.c:42
        #3 0x55e075ec81cb in get_reachable_subset commit-reach.c:917
        #4 0x55e075fe9cce in add_missing_tags remote.c:1518
        #5 0x55e075fea1e4 in match_push_refs remote.c:1665
        git#6 0x55e076050a8e in transport_push transport.c:1378
        #7 0x55e075e2eb74 in push_with_options builtin/push.c:401
        git#8 0x55e075e2edb0 in do_push builtin/push.c:458
        git#9 0x55e075e2ff7a in cmd_push builtin/push.c:702
        git#10 0x55e075d8aaf0 in run_builtin git.c:452
        git#11 0x55e075d8af08 in handle_builtin git.c:706
        git#12 0x55e075d8b12c in run_argv git.c:770
        git#13 0x55e075d8b6a0 in cmd_main git.c:905
        git#14 0x55e075e81f07 in main common-main.c:60
        git#15 0x7fb5360ab6c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
        git#16 0x7fb5360ab784 in __libc_start_main_impl ../csu/libc-start.c:360
        git#17 0x55e075d88f40 in _start (git+0x1ff40) (BuildId: 38ad998b85a535e786129979443630d025ec2453)

    SUMMARY: LeakSanitizer: 384 byte(s) leaked in 1 allocation(s).

This leak was addressed independently via 68b5117 (commit-reach: fix
memory leak in get_reachable_subset(), 2023-06-03), which makes t5583
leak-free.

But t5583 was not in the tree when 68b5117 was written, and the two
only met after the latter was merged back in via 693bde4 (Merge
branch 'mh/commit-reach-get-reachable-plug-leak', 2023-06-20).

At that point, t5583 was leak-free. Let's mark it as such accordingly.

Signed-off-by: Taylor Blau <[email protected]>
Acked-by: Jeff King <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
dyrone pushed a commit that referenced this pull request Feb 12, 2024
It is tempting to think of "files and directories" of the current
directory as valid inputs to the add and set subcommands of git
sparse-checkout.  However, in non-cone mode, they often aren't and using
them as potential completions leads to *many* forms of confusion:

Issue #1. It provides the *wrong* files and directories.

For
    git sparse-checkout add
we always want to add files and directories not currently in our sparse
checkout, which means we want file and directories not currently present
in the current working tree.  Providing the files and directories
currently present is thus always wrong.

For
    git sparse-checkout set
we have a similar problem except in the subset of cases where we are
trying to narrow our checkout to a strict subset of what we already
have.  That is not a very common scenario, especially since it often
does not even happen to be true for the first use of the command; for
years we required users to create a sparse-checkout via
    git sparse-checkout init
    git sparse-checkout set <args...>
(or use a clone option that did the init step for you at clone time).
The init command creates a minimal sparse-checkout with just the
top-level directory present, meaning the set command has to be used to
expand the checkout.  Thus, only in a special and perhaps unusual cases
would any of the suggestions from normal file and directory completion
be appropriate.

Issue #2: Suggesting patterns that lead to warnings is unfriendly.

If the user specifies any regular file and omits the leading '/', then
the sparse-checkout command will warn the user that their command is
problematic and suggest they use a leading slash instead.

Issue #3: Completion gets confused by leading '/', and provides wrong paths.

Users often want to anchor their patterns to the toplevel of the
repository, especially when listing individual files.  There are a
number of reasons for this, but notably even sparse-checkout encourages
them to do so (as noted above).  However, if users do so (via adding a
leading '/' to their pattern), then bash completion will interpret the
leading slash not as a request for a path at the toplevel of the
repository, but as a request for a path at the root of the filesytem.
That means at best that completion cannot help with such paths, and if
it does find any completions, they are almost guaranteed to be wrong.

Issue #4: Suggesting invalid patterns from subdirectories is unfriendly.

There is no per-directory equivalent to .gitignore with
sparse-checkouts.  There is only a single worktree-global
$GIT_DIR/info/sparse-checkout file.  As such, paths to files must be
specified relative to the toplevel of a repository.  Providing
suggestions of paths that are relative to the current working directory,
as bash completion defaults to, is wrong when the current working
directory is not the worktree toplevel directory.

Issue #5: Paths with special characters will be interpreted incorrectly

The entries in the sparse-checkout file are patterns, not paths.  While
most paths also qualify as patterns (though even in such cases it would
be better for users to not use them directly but prefix them with a
leading '/'), there are a variety of special characters that would need
special escaping beyond the normal shell escaping: '*', '?', '\', '[',
']', and any leading '#' or '!'.  If completion suggests any such paths,
users will likely expect them to be treated as an exact path rather than
as a pattern that might match some number of files other than 1.

However, despite the first four issues, we can note that _if_ users are
using tab completion, then they are probably trying to specify a path in
the index.  As such, we transform their argument into a top-level-rooted
pattern that matches such a file.  For example, if they type:
   git sparse-checkout add Make<TAB>
we could "complete" to
   git sparse-checkout add /Makefile
or, if they ran from the Documentation/technical/ subdirectory:
   git sparse-checkout add m<TAB>
we could "complete" it to:
   git sparse-checkout add /Documentation/technical/multi-pack-index.txt
Note in both cases I use "complete" in quotes, because we actually add
characters both before and after the argument in question, so we are
kind of abusing "bash completions" to be "bash completions AND
beginnings".

The fifth issue is a bit stickier, especially when you consider that we
not only need to deal with escaping issues because of special meanings
of patterns in sparse-checkout & gitignore files, but also that we need
to consider escaping issues due to ls-files needing to sometimes quote
or escape characters, and because the shell needs to escape some
characters.  The multiple interacting forms of escaping could get ugly;
this patch makes no attempt to do so and simply documents that we
decided to not deal with those corner cases for now but at least get the
common cases right.

Signed-off-by: Elijah Newren <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
dyrone pushed a commit that referenced this pull request Feb 12, 2024
The t5309 script triggers a racy false positive with SANITIZE=leak on a
multi-core system. Running with "--stress --run=6" usually fails within
10 seconds or so for me, complaining with something like:

    + git index-pack --fix-thin --stdin
    fatal: REF_DELTA at offset 46 already resolved (duplicate base 01d7713666f4de822776c7622c10f1b07de280dc?)

    =================================================================
    ==3904583==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 32 byte(s) in 1 object(s) allocated from:
        #0 0x7fa790d01986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
        #1 0x7fa790add769 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
        #2 0x7fa790d117c5 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
        #3 0x7fa790d11957 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:598
        #4 0x7fa790d03fe8 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:51
        #5 0x7fa790d013fd in __lsan_thread_start_func ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:440
        git#6 0x7fa790adc3eb in start_thread nptl/pthread_create.c:444
        #7 0x7fa790b5ca5b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

    SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s).
    Aborted

What happens is this:

  0. We construct a bogus pack with a duplicate object in it and trigger
     index-pack.

  1. We spawn a bunch of worker threads to resolve deltas (on my system
     it is 16 threads).

  2. One of the threads sees the duplicate object and bails by calling
     exit(), taking down all of the threads. This is expected and is the
     point of the test.

  3. At the time exit() is called, we may still be spawning threads from
     the main process via pthread_create(). LSan hooks thread creation
     to update its book-keeping; it has to know where each thread's
     stack is (so it can find entry points for reachable memory). So it
     calls pthread_getattr_np() to get information about the new thread.
     That may allocate memory that must be freed with a matching call to
     pthread_attr_destroy(). Probably LSan does that immediately, but
     if you're unlucky enough, the exit() will happen while it's between
     those two calls, and the allocated pthread_attr_t appears as a
     leak.

This isn't a real leak. It's not even in our code, but rather in the
LSan instrumentation code. So we could just ignore it. But the false
positive can cause people to waste time tracking it down.

It's possibly something that LSan could protect against (e.g., cover the
getattr/destroy pair with a mutex, and then in the final post-exit()
check for leaks try to take the same mutex). But I don't know enough
about LSan to say if that's a reasonable approach or not (or if my
analysis is even completely correct).

In the meantime, it's pretty easy to avoid the race by making creation
of the worker threads "atomic". That is, we'll spawn all of them before
letting any of them start to work. That's easy to do because we already
have a work_lock() mutex for handing out that work. If the main process
takes it, then all of the threads will immediately block until we've
finished spawning and released it.

This shouldn't make any practical difference for non-LSan runs. The
thread spawning is quick, and could happen before any worker thread gets
scheduled anyway.

Probably other spots that use threads are subject to the same issues.
But since we have to manually insert locking (and since this really is
kind of a hack), let's not bother with them unless somebody experiences
a similar racy false-positive in practice.

Signed-off-by: Jeff King <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
dyrone pushed a commit that referenced this pull request Jul 18, 2024
When performing multi-pack reuse, reuse_partial_packfile_from_bitmap()
is responsible for generating an array of bitmapped_pack structs from
which to perform reuse.

In the multi-pack case, we loop over the MIDXs packs and copy the result
of calling `nth_bitmapped_pack()` to construct the list of reusable
paths.

But we may also want to do pack-reuse over a single pack, either because
we only had one pack to perform reuse over (in the case of single-pack
bitmaps), or because we explicitly asked to do single pack reuse even
with a MIDX[^1].

When this is the case, the array we generate of reusable packs contains
only a single element, which is either (a) the pack attached to the
single-pack bitmap, or (b) the MIDX's preferred pack.

In 795006f (pack-bitmap: gracefully handle missing BTMP chunks,
2024-04-15), we refactored the reuse_partial_packfile_from_bitmap()
function and stopped assigning the pack_int_id field when reusing only
the MIDX's preferred pack. This results in an uninitialized read down in
try_partial_reuse() like so:

    ==7474==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55c5cd191dde in try_partial_reuse pack-bitmap.c:1887:8
    #1 0x55c5cd191dde in reuse_partial_packfile_from_bitmap_1 pack-bitmap.c:2001:8
    #2 0x55c5cd191dde in reuse_partial_packfile_from_bitmap pack-bitmap.c:2105:3
    #3 0x55c5cce0bd0e in get_object_list_from_bitmap builtin/pack-objects.c:4043:3
    #4 0x55c5cce0bd0e in get_object_list builtin/pack-objects.c:4156:27
    #5 0x55c5cce0bd0e in cmd_pack_objects builtin/pack-objects.c:4596:3
    git#6 0x55c5ccc8fac8 in run_builtin git.c:474:11

which happens when try_partial_reuse() tries to call
midx_pair_to_pack_pos() when it tries to reject cross-pack deltas.

Avoid the uninitialized read by ensuring that the pack_int_id field is
set in the single-pack reuse case by setting it to either the MIDX
preferred pack's pack_int_id, or '-1', in the case of single-pack
bitmaps.  In the latter case, we never read the pack_int_id field, so
the choice of '-1' is intentional as a "garbage in, garbage out"
measure.

Guard against further regressions in this area by adding a test which
ensures that we do not throw out deltas from the preferred pack as
"cross-pack" due to an uninitialized pack_int_id.

[^1]: This can happen for a couple of reasons, either because the
  repository is configured with 'pack.allowPackReuse=(true|single)', or
  because the MIDX was generated prior to the introduction of the BTMP
  chunk, which contains information necessary to perform multi-pack
  reuse.

Reported-by: Kyle Lippincott <[email protected]>
Signed-off-by: Taylor Blau <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
dyrone pushed a commit that referenced this pull request Jul 18, 2024
Memory sanitizer (msan) is detecting a use of an uninitialized variable
(`size`) in `read_attr_from_index`:

    ==2268==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5651f3416504 in read_attr_from_index git/attr.c:868:11
    #1 0x5651f3415530 in read_attr git/attr.c
    #2 0x5651f3413d74 in bootstrap_attr_stack git/attr.c:968:6
    #3 0x5651f3413d74 in prepare_attr_stack git/attr.c:1004:2
    #4 0x5651f3413d74 in collect_some_attrs git/attr.c:1199:2
    #5 0x5651f3413144 in git_check_attr git/attr.c:1345:2
    git#6 0x5651f34728da in convert_attrs git/convert.c:1320:2
    #7 0x5651f3473425 in would_convert_to_git_filter_fd git/convert.c:1373:2
    git#8 0x5651f357a35e in index_fd git/object-file.c:2630:34
    git#9 0x5651f357aa15 in index_path git/object-file.c:2657:7
    git#10 0x5651f35db9d9 in add_to_index git/read-cache.c:766:7
    git#11 0x5651f35dc170 in add_file_to_index git/read-cache.c:799:9
    git#12 0x5651f321f9b2 in add_files git/builtin/add.c:346:7
    git#13 0x5651f321f9b2 in cmd_add git/builtin/add.c:565:18
    git#14 0x5651f321d327 in run_builtin git/git.c:474:11
    git#15 0x5651f321bc9e in handle_builtin git/git.c:729:3
    git#16 0x5651f321a792 in run_argv git/git.c:793:4
    git#17 0x5651f321a792 in cmd_main git/git.c:928:19
    git#18 0x5651f33dde1f in main git/common-main.c:62:11

The issue exists because `size` is an output parameter from
`read_blob_data_from_index`, but it's only modified if
`read_blob_data_from_index` returns non-NULL. The read of `size` when
calling `read_attr_from_buf` unconditionally may read from an
uninitialized value. `read_attr_from_buf` checks that `buf` is non-NULL
before reading from `size`, but by then it's already too late: the
uninitialized read will have happened already. Furthermore, there's no
guarantee that the compiler won't reorder things so that it checks
`size` before checking `!buf`.

Make the call to `read_attr_from_buf` conditional on `buf` being
non-NULL, ensuring that `size` is not read if it's never set.

Signed-off-by: Kyle Lippincott <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
dyrone pushed a commit that referenced this pull request Oct 4, 2024
It was recently reported that concurrent reads and writes may cause the
reftable backend to segfault. The root cause of this is that we do not
properly keep track of reftable readers across reloads.

Suppose that you have a reftable iterator and then decide to reload the
stack while iterating through the iterator. When the stack has been
rewritten since we have created the iterator, then we would end up
discarding a subset of readers that may still be in use by the iterator.
The consequence is that we now try to reference deallocated memory,
which of course segfaults.

One way to trigger this is in t5616, where some background maintenance
jobs have been leaking from one test into another. This leads to stack
traces like the following one:

  + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
  AddressSanitizer:DEADLYSIGNAL
  =================================================================
  ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp
0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
  ==657994==The signal is caused by a READ memory access.
      #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
      #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
      #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
      #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
      #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
      #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
      git#6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
      #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
      git#8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
      git#9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
      git#10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
      git#11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
      git#12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
      git#13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
      git#14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
      git#15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
      git#16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
      git#17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
      git#18 0x55f23dba7764 in run_builtin git.c:484
      git#19 0x55f23dba7764 in handle_builtin git.c:741
      git#20 0x55f23dbab61e in run_argv git.c:805
      git#21 0x55f23dbab61e in cmd_main git.c:1000
      git#22 0x55f23dba4781 in main common-main.c:64
      git#23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
      git#24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
      git#25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)

While it is somewhat awkward that the maintenance processes survive
tests in the first place, it is totally expected that reftables should
work alright with concurrent writers. Seemingly they don't.

The only underlying resource that we need to care about in this context
is the reftable reader, which is responsible for reading a single table
from disk. These readers get discarded immediately (unless reused) when
calling `reftable_stack_reload()`, which is wrong. We can only close
them once we know that there are no iterators using them anymore.

Prepare for a fix by converting the reftable readers to be refcounted.

Reported-by: Jeff King <[email protected]>
Signed-off-by: Patrick Steinhardt <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.