Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have CI check if we need to commit regenerated archives #1590

Merged
merged 6 commits into from
Sep 11, 2024

Conversation

EliahKagan
Copy link
Member

@EliahKagan EliahKagan commented Sep 11, 2024

This adds the ability of gix-testtools to generate archives of fixture output even when the environment is detected to be CI. This behavior is turned off by default, but enabled if GIX_TEST_CREATE_ARCHIVES_EVEN_ON_CI is set. A small amount of refactoring is included to avoid decreasing code readability while making this change.

Although there are a few plausible use cases for this, which along with other information are detailed in commit messages, the motivating use case is to let us use CI to find out if generated archives are out of date. This is something that has happened a number of times, and is not always immediately noticed, because it has had to be observed in the state of a repository after running tests locally.

On this project's CI, this newly recognized environment variable is:

  • Left unset in the test job.
  • Set in the test-fast jobs. The test-fast jobs will use existing generated archives if available, because GIX_TEST_IGNORE_ARCHIVES is not set for them, so this should only cause more work to be done on CI in situations where there something is wrong.

In addition to setting GIX_TEST_CREATE_ARCHIVES_EVEN_ON_CI for the test-fast jobs, this also adds a new step at the end to run git diff --exit-code. That actually detects if any new archives were create. This is because, in practice, as far as I know, they always end up being at least slightly different from whatever existed before for them, if anything.

There may be slightly more efficient ways to detect this, and maybe even slightly more reliable ways to detect it, but I think it makes sense to start with this way, because:

  • As noted in commit messages, there are some other plausible applications of the new environment variable anyway, including if other projects that use gix-testtools ever want to use CI to generate their archives and save them as artifacts for reuse.
  • This automates what is, in effect, the exact workflow that happens locally, for noticing that running the test suite produces a new or changed generated archive.

As of now, CI will fail on this PR, because there is an archive that needs to be regenerated. The new step can be expected to fails with:

Run git diff --exit-code
  git diff --exit-code
  shell: /bin/bash -e {0}
  env:
    CARGO_TERM_COLOR: always
    CLICOLOR: 1
    CARGO_HOME: /Users/runner/.cargo
    CARGO_INCREMENTAL: 0
    CACHE_ON_FAILURE: false
diff --git a/gix-worktree-state/tests/fixtures/generated-archives/make_dangling_symlink_to_windows_reserved.tar b/gix-worktree-state/tests/fixtures/generated-archives/make_dangling_symlink_to_windows_reserved.tar
index 3d9c416..4d50761 100644
Binary files a/gix-worktree-state/tests/fixtures/generated-archives/make_dangling_symlink_to_windows_reserved.tar and b/gix-worktree-state/tests/fixtures/generated-archives/make_dangling_symlink_to_windows_reserved.tar differ
Error: Process completed with exit code 1.

If #1589 is merged to main, and then this PR is rebased onto main (or main is merged into it afterwards), CI here should pass.

This renames the private funciton `create_archive_if_not_on_ci` in
gix-testtols to `create_archive_if_we_should`, because it already
checks for much more than whether we are on CI to decide whether
or not to create the archive. This archive is also (already) not
created if we are running on Windows, if the archive destination is
excluded by being ignored in any `.gitignore`, or if Git LFS is in
use and the archive is an LFS pointer file.
When `gix-testtools` runs fixtures, it now recognizes a new
environment variable, `GIX_TEST_CREATE_ARCHIVES_EVEN_ON_CI`,
specifying that archives should be generated even when on CI.

By default, they are still not generated when on CI. It may make
sense to enable this:

- If automatically testing archive creation, or

- As a way to check that all intended generated arhives are committed
  (which is the motivating use case for this feature), or

- If actually using CI to generate archives that will be uploaded
  as artifacts, or

- In unusual non-CI environments that are mis-detected as CI
  (though that should usually be investigated and fixed, since some
  software performs destructive operations more readily without
  interactive checks when CI is detected).

The usual reason for not generating archives on CI is that they
would not typically be preserved. Thus refraining from generating
them on CI remains the default behavior.

Like the `GIX_TEST_IGNORE_ARCHIVES` environment variable, the new
variable `GIX_TEST_CREATE_ARCHIVES_EVEN_ON_CI` is currently
interpreted as "true" based solely on its presence. This is to say
that is actual value is currently not examined.
This adds a step to `test-fast` to run `git diff` and fail if there
are any changes to any tracked files, and sets the newly recognized
`GIX_TEST_CREATE_ARCHIVES_EVEN_ON_CI` environment variable for the
`nextest` step in which fixture scripts are run.

Any changes would indicate a problem that should be addressed, but
the main goal here is to catch when a regenerated archive has not
been committed.

Because adding archives to `.gitignore` would also cause this to
pass but is far less often the appropriate solution, this change
includes a comment about what the best solution *usually* is.

This is added to the `test-fast` jobs but not to `test` job because
the `test` job uses `GIT_TEST_IGNORE_ARCHIVES` to prevent already
existing archives from being used, which will nearly always result
in at least slightly different archives being generated. So in the
`test` job this would, in practice, always give a false-positive
failure.

Because archive generation is (at least currently) suppressed on
Windows, this step should rarely if ever fail on Windows even when
the problem it is looking for is present. But if it does fail on
Windows, then there is a problem that ought to be investigated,
and if the problem it is looking for is present, then it will
still fail on the other systems, unless the problem is specific to
Windows.

Running this step on both Ubuntu and macOS (and Windows), rather
than just Ubuntu (and Windows), has the benefit of catching some
kinds of skew. It is also somewhat simpler than setting it only
for some operating systems.
So it actually returns a failing exit status if there were changes.
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, I love that this part of the workflow is finally automated and a little safer.

As a matter of fact, due to me using nightly locally and gix-macros tests failing due to that I stopped running just test locally entirely. Thus I wouldn't even notice if some archives are regenerated anymore.

On the negative side, as Git is updated, there might be test-failures due to that, making CI tests less reliable. However, for now I'd think we want to know and typically update the archives accordingly.

@Byron Byron enabled auto-merge September 11, 2024 07:25
@Byron Byron merged commit 4f92140 into GitoxideLabs:main Sep 11, 2024
15 checks passed
@EliahKagan EliahKagan deleted the run-ci/check-clean branch September 11, 2024 07:39
@EliahKagan
Copy link
Member Author

EliahKagan commented Sep 11, 2024

On the negative side, as Git is updated, there might be test-failures due to that, making CI tests less reliable.

I think that should only be expected to happen if both GIX_TEST_IGNORE_ARCHIVES and GIX_TEST_CREATE_ARCHIVES_EVEN_ON_CI are used together, which is why I did not do that on CI.

In that situation, the Git version and usually other details of the system would have to be the same to prevent changes to tracked files. For example, when I run cargo nextest run --all with GIX_TEST_IGNORE_ARCHIVES=1 locally on any system besides Windows (where archive creation is suppressed in gix-testtools), I get a large number of changed archives.

But with only GIX_TEST_CREATE_ARCHIVES_EVEN_ON_CI and not GIX_TEST_IGNORE_ARCHIVES, I believe the only fixture scripts that should be run are those whose associated archives are:

  • Excluded by being listed in a .gitignore file, in which case git diff won't show anything for them because they're ignored, or
  • Missing or out of date relative to the most recent change of the fixture script, in which case that is the situation we are looking for, or
  • Running because a problem in gix-testtools prevents it from identifying an archive or reuses an identity across multiple archives. But that also seems like something that ought to cause a failure. The latter was the cause of #1440 if I understand #1441 correctly.

However, if there are other situations where just the version of Git changing will cause a failure here, then I think a comment or something should be added to warn about it, possibly in the CI workflow where I already have a comment suggesting to regenerate the archive.

@Byron
Copy link
Member

Byron commented Sep 11, 2024

In that situation, the Git version and usually other details of the system would have to be the same to prevent changes to tracked files. For example, when I run cargo nextest run --all with GIX_TEST_IGNORE_ARCHIVES=1 locally on any system besides Windows (where archive creation is suppressed in gix-testtools), I get a large number of changed archives.

Ah, I see, thanks for elaborating. It's interesting how I am already not truly aware of how this all works anymore, and just take it for granted assuming it works correctly.

Thanks again for etching out the shortcomings!

@EliahKagan
Copy link
Member Author

One interesting aspect of this is that even on the same system, if I run GIX_TEST_IGNORE_ARCHIVES=1 cargo nextest run --all, stage, and then run it again, I get further changes. (Presumably, if I did not stage in between, I would also get further changes. Staging just lets me see the changes.)

I am not clear on why that is, and I would like to understand it better. I'd also like to make it so changes to generated archives are easier to examine, which might just entail documenting somewhere what configuration to use, but might also (or instead) involve changing .gitattributes, I am not sure.

After both runs (in case somehow it would affect them), I made this staged but uncommitted change to .gitattributes for simplicity, though I'm not sure it's really required:

diff --git a/.gitattributes b/.gitattributes
index 50d83ecdf..6245dce7c 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -1,4 +1,5 @@
 **/generated-archives/*.tar* filter=lfs-disabled diff=lfs merge=lfs -text
+*.tar diff=tar

 # assure line feeds don't interfere with our working copy hash
 **/tests/fixtures/**/*.sh text crlf=input      eol=lf

And also added this local configuration (I could have added an lfs configuration instead, which is why I am not sure the above change is required):

[diff "tar"]
        textconv = tar -tvf
        binary = true

So one question I have is what changes, if any, should be committed to .gitattributes to make it easier to investigate this.

But the other question is why I get the diffs I do. I investigated this on an Ubuntu 24.04 LTS system with git 2.43.0-1ubuntu7.1. Full results are in this gist.

The first diff, which is the output of git diff --staged except that I have removed the part showing changes to .gitattributes applied after the runs, represents the difference between the archives are they were committed, and the archives after I run the whole test suite and force them to be regenerated. It makes intuitive sense that there would be many changes, and a cursory examination of the changes is unsurprising: many seem like the sort of thing one would expect to be different across platforms or versions, such as what .sample hooks exist with what contents, though some of the other changes I am less clear on.

The second diff, which is the output of git diff, represents the difference between the archives after being recreated, versus after being recreated again in an immediately subsequent full test run. These changes are shorter, so I've included them below, and I also do not understand the reason for them:

diff --git a/gix-index/tests/fixtures/generated-archives/v2_split_index.tar b/gix-index/tests/fixtures/generated-archives/v2_split_index.tar
index 409da3baf..00983e9a7 100644
--- a/gix-index/tests/fixtures/generated-archives/v2_split_index.tar
+++ b/gix-index/tests/fixtures/generated-archives/v2_split_index.tar
@@ -5,6 +5,7 @@ drwxr-xr-x 0/0               0 2006-07-24 01:21 .git
 -rw-r--r-- 0/0              21 2006-07-24 01:21 .git/HEAD
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/info
 -rw-r--r-- 0/0             240 2006-07-24 01:21 .git/info/exclude
+-rw-r--r-- 0/0              96 2006-07-24 01:21 .git/sharedindex.a7c54197d242f14f5c246cda5be747aee80393b8
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/hooks
 -rwxr-xr-x 0/0             189 2006-07-24 01:21 .git/hooks/post-update.sample
 -rwxr-xr-x 0/0             424 2006-07-24 01:21 .git/hooks/pre-applypatch.sample
@@ -20,7 +21,6 @@ drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/hooks
 -rwxr-xr-x 0/0            1492 2006-07-24 01:21 .git/hooks/prepare-commit-msg.sample
 -rwxr-xr-x 0/0            1643 2006-07-24 01:21 .git/hooks/pre-commit.sample
 -rwxr-xr-x 0/0            1374 2006-07-24 01:21 .git/hooks/pre-push.sample
--rw-r--r-- 0/0              96 2006-07-24 01:21 .git/sharedindex.772b543d373251f25f394435e8ff6a0ad429b5c1
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/logs
 -rw-r--r-- 0/0             156 2006-07-24 01:21 .git/logs/HEAD
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/logs/refs
diff --git a/gix-index/tests/fixtures/generated-archives/v2_split_index_recursive.tar b/gix-index/tests/fixtures/generated-archives/v2_split_index_recursive.tar
index d7b836330..80745dfa2 100644
--- a/gix-index/tests/fixtures/generated-archives/v2_split_index_recursive.tar
+++ b/gix-index/tests/fixtures/generated-archives/v2_split_index_recursive.tar
@@ -1,8 +1,8 @@
 drwxr-xr-x 0/0               0 2006-07-24 01:21 ./
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git
 -rw-r--r-- 0/0              73 2006-07-24 01:21 .git/description
+-rw-r--r-- 0/0             205 2006-07-24 01:21 .git/sharedindex.f3e525d52de36e08a116a91a59b84ae2512073d3
 -rw-r--r-- 0/0             113 2006-07-24 01:21 .git/config
--rw-r--r-- 0/0             205 2006-07-24 01:21 .git/sharedindex.b03111262ebac0866ea3bae2ae5987cd2abdcaef
 -rw-r--r-- 0/0              21 2006-07-24 01:21 .git/HEAD
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/info
 -rw-r--r-- 0/0             240 2006-07-24 01:21 .git/info/exclude
diff --git a/gix-index/tests/fixtures/generated-archives/v2_split_vs_regular_index.tar b/gix-index/tests/fixtures/generated-archives/v2_split_vs_regular_index.tar
index 78c144b13..25f67056d 100644
--- a/gix-index/tests/fixtures/generated-archives/v2_split_vs_regular_index.tar
+++ b/gix-index/tests/fixtures/generated-archives/v2_split_vs_regular_index.tar
@@ -72,7 +72,6 @@ drwxr-xr-x 0/0               0 2006-07-24 01:21 split
 -rw-r--r-- 0/0               2 2006-07-24 01:21 split/z
 drwxr-xr-x 0/0               0 2006-07-24 01:21 split/.git
 -rw-r--r-- 0/0              73 2006-07-24 01:21 split/.git/description
--rw-r--r-- 0/0             416 2006-07-24 01:21 split/.git/sharedindex.1183b4544429fd6795bd9c3e1babd628a19bf369
 -rw-r--r-- 0/0             129 2006-07-24 01:21 split/.git/config
 -rw-r--r-- 0/0              21 2006-07-24 01:21 split/.git/HEAD
 drwxr-xr-x 0/0               0 2006-07-24 01:21 split/.git/info
@@ -92,6 +91,7 @@ drwxr-xr-x 0/0               0 2006-07-24 01:21 split/.git/hooks
 -rwxr-xr-x 0/0            1492 2006-07-24 01:21 split/.git/hooks/prepare-commit-msg.sample
 -rwxr-xr-x 0/0            1643 2006-07-24 01:21 split/.git/hooks/pre-commit.sample
 -rwxr-xr-x 0/0            1374 2006-07-24 01:21 split/.git/hooks/pre-push.sample
+-rw-r--r-- 0/0             416 2006-07-24 01:21 split/.git/sharedindex.b65ac769815289980f011f9ae3406be2c8fa8407
 drwxr-xr-x 0/0               0 2006-07-24 01:21 split/.git/logs
 -rw-r--r-- 0/0             305 2006-07-24 01:21 split/.git/logs/HEAD
 drwxr-xr-x 0/0               0 2006-07-24 01:21 split/.git/logs/refs
diff --git a/gix-negotiate/tests/fixtures/generated-archives/make_repos.tar b/gix-negotiate/tests/fixtures/generated-archives/make_repos.tar
index f4b7cc55f..021b86233 100644
--- a/gix-negotiate/tests/fixtures/generated-archives/make_repos.tar
+++ b/gix-negotiate/tests/fixtures/generated-archives/make_repos.tar
@@ -758,9 +758,9 @@ drwxr-xr-x 0/0               0 2006-07-24 01:21 multi_round/client/.git/objects/
 -rw-r--r-- 0/0           10712 2006-07-24 01:21 multi_round/client/.git/objects/info/commit-graph
 -rw-r--r-- 0/0              54 2006-07-24 01:21 multi_round/client/.git/objects/info/packs
 drwxr-xr-x 0/0               0 2006-07-24 01:21 multi_round/client/.git/objects/pack
--rw-r--r-- 0/0            1972 2006-07-24 01:21 multi_round/client/.git/objects/pack/pack-fe61307136aecbdd8c8dfe398d26986291921283.rev
--rw-r--r-- 0/0           59451 2006-07-24 01:21 multi_round/client/.git/objects/pack/pack-fe61307136aecbdd8c8dfe398d26986291921283.pack
--rw-r--r-- 0/0           14512 2006-07-24 01:21 multi_round/client/.git/objects/pack/pack-fe61307136aecbdd8c8dfe398d26986291921283.idx
+-rw-r--r-- 0/0            1972 2006-07-24 01:21 multi_round/client/.git/objects/pack/pack-0d92e2ea6533a6622cdab3bd74bccdce3d85f34e.rev
+-rw-r--r-- 0/0           14512 2006-07-24 01:21 multi_round/client/.git/objects/pack/pack-0d92e2ea6533a6622cdab3bd74bccdce3d85f34e.idx
+-rw-r--r-- 0/0           59340 2006-07-24 01:21 multi_round/client/.git/objects/pack/pack-0d92e2ea6533a6622cdab3bd74bccdce3d85f34e.pack
 -rw-r--r-- 0/0               7 2006-07-24 01:21 multi_round/client/b8.c12.t
 -rw-r--r-- 0/0               6 2006-07-24 01:21 multi_round/client/b8.c3.t
 -rw-r--r-- 0/0               7 2006-07-24 01:21 multi_round/client/b8.c17.t
diff --git a/gix-odb/tests/fixtures/generated-archives/make_repo_multi_index.tar b/gix-odb/tests/fixtures/generated-archives/make_repo_multi_index.tar
index 92d7ef83a..332922625 100644
--- a/gix-odb/tests/fixtures/generated-archives/make_repo_multi_index.tar
+++ b/gix-odb/tests/fixtures/generated-archives/make_repo_multi_index.tar
@@ -691,42 +691,42 @@ drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/objects/cf
 -rw-r--r-- 0/0              24 2006-07-24 01:21 .git/objects/cf/476b3404986ec61542145808a5c404d2a42789
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/objects/pack
 -rw-r--r-- 0/0            4669 2006-07-24 01:21 .git/objects/pack/pack-2bbdc0e28b43868ee99e4be43a1792a40b4e9129.pack
+-rw-r--r-- 0/0            3368 2006-07-24 01:21 .git/objects/pack/pack-1ad50de26d8845bd6bb80a060f7d7c3f00505fb7.idx
 -rw-r--r-- 0/0           26060 2006-07-24 01:21 .git/objects/pack/multi-pack-index
 -rw-r--r-- 0/0             264 2006-07-24 01:21 .git/objects/pack/pack-9b7062fbd0582882cab3454d7b3de09f1578bfa8.rev
 -rw-r--r-- 0/0            1828 2006-07-24 01:21 .git/objects/pack/pack-8c07ea08d9a9a4be5bca3e7fc246893243950f6d.idx
 -rw-r--r-- 0/0             136 2006-07-24 01:21 .git/objects/pack/pack-2aa9aa8c891b01a5dfd811c9ad9e1381dd469e39.rev
 -rw-r--r-- 0/0            3172 2006-07-24 01:21 .git/objects/pack/pack-572114382e4d12da25a0587fbc715315ece056bb.idx
--rw-r--r-- 0/0            3900 2006-07-24 01:21 .git/objects/pack/pack-e3a08db09c6b3290bca8c291d47e2810d6075392.idx
 -rw-r--r-- 0/0             236 2006-07-24 01:21 .git/objects/pack/pack-f8777e0a8830020ac1c2e49d399356bbb0f4d48b.rev
+-rw-r--r-- 0/0            3536 2006-07-24 01:21 .git/objects/pack/pack-1153fd15cfceb0c7f356c0d63111b0654c495a8b.idx
 -rw-r--r-- 0/0            2556 2006-07-24 01:21 .git/objects/pack/pack-9b7062fbd0582882cab3454d7b3de09f1578bfa8.idx
 -rw-r--r-- 0/0            2221 2006-07-24 01:21 .git/objects/pack/pack-5bc9b6bf1bf64398bfb3e435518e93d3e145382e.pack
 -rw-r--r-- 0/0            3854 2006-07-24 01:21 .git/objects/pack/pack-9b7062fbd0582882cab3454d7b3de09f1578bfa8.pack
 -rw-r--r-- 0/0             428 2006-07-24 01:21 .git/objects/pack/pack-9c7e898ea061ab4af575def7f9d0e6836d727871.rev
 -rw-r--r-- 0/0            1238 2006-07-24 01:21 .git/objects/pack/pack-2aa9aa8c891b01a5dfd811c9ad9e1381dd469e39.pack
 -rw-r--r-- 0/0            4931 2006-07-24 01:21 .git/objects/pack/pack-dd8f4f04f6c665f86f8a954204f89a3807fc890f.pack
--rw-r--r-- 0/0            7087 2006-07-24 01:21 .git/objects/pack/pack-818fb3b664fab9435175f66012273ef903ea893c.pack
--rw-r--r-- 0/0             456 2006-07-24 01:21 .git/objects/pack/pack-e3a08db09c6b3290bca8c291d47e2810d6075392.rev
--rw-r--r-- 0/0             380 2006-07-24 01:21 .git/objects/pack/pack-818fb3b664fab9435175f66012273ef903ea893c.rev
+-rw-r--r-- 0/0            7457 2006-07-24 01:21 .git/objects/pack/pack-1153fd15cfceb0c7f356c0d63111b0654c495a8b.pack
+-rw-r--r-- 0/0            7198 2006-07-24 01:21 .git/objects/pack/pack-1ad50de26d8845bd6bb80a060f7d7c3f00505fb7.pack
 -rw-r--r-- 0/0             304 2006-07-24 01:21 .git/objects/pack/pack-2bbdc0e28b43868ee99e4be43a1792a40b4e9129.rev
+-rw-r--r-- 0/0             380 2006-07-24 01:21 .git/objects/pack/pack-1ad50de26d8845bd6bb80a060f7d7c3f00505fb7.rev
 -rw-r--r-- 0/0             352 2006-07-24 01:21 .git/objects/pack/pack-572114382e4d12da25a0587fbc715315ece056bb.rev
 -rw-r--r-- 0/0             276 2006-07-24 01:21 .git/objects/pack/pack-8b66663f0e277f430c0574bca2ad2d27af318b0b.rev
 -rw-r--r-- 0/0            2024 2006-07-24 01:21 .git/objects/pack/pack-5bc9b6bf1bf64398bfb3e435518e93d3e145382e.idx
 -rw-r--r-- 0/0            1660 2006-07-24 01:21 .git/objects/pack/pack-2aa9aa8c891b01a5dfd811c9ad9e1381dd469e39.idx
--rw-r--r-- 0/0           10084 2006-07-24 01:21 .git/objects/pack/pack-e3a08db09c6b3290bca8c291d47e2810d6075392.pack
--rw-r--r-- 0/0            7550 2006-07-24 01:21 .git/objects/pack/pack-1e0a20554ca41743e224be67a9f273e253f5b461.pack
diff --git a/gix-index/tests/fixtures/generated-archives/v2_split_index.tar b/gix-index/tests/fixtures/generated-archives/v2_split_index.tar
index 409da3baf..00983e9a7 100644
--- a/gix-index/tests/fixtures/generated-archives/v2_split_index.tar
+++ b/gix-index/tests/fixtures/generated-archives/v2_split_index.tar
@@ -5,6 +5,7 @@ drwxr-xr-x 0/0               0 2006-07-24 01:21 .git
 -rw-r--r-- 0/0              21 2006-07-24 01:21 .git/HEAD
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/info
 -rw-r--r-- 0/0             240 2006-07-24 01:21 .git/info/exclude
+-rw-r--r-- 0/0              96 2006-07-24 01:21 .git/sharedindex.a7c54197d242f14f5c246cda5be747aee80393b8
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/hooks
 -rwxr-xr-x 0/0             189 2006-07-24 01:21 .git/hooks/post-update.sample
 -rwxr-xr-x 0/0             424 2006-07-24 01:21 .git/hooks/pre-applypatch.sample
@@ -20,7 +21,6 @@ drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/hooks
 -rwxr-xr-x 0/0            1492 2006-07-24 01:21 .git/hooks/prepare-commit-msg.sample
 -rwxr-xr-x 0/0            1643 2006-07-24 01:21 .git/hooks/pre-commit.sample
 -rwxr-xr-x 0/0            1374 2006-07-24 01:21 .git/hooks/pre-push.sample
--rw-r--r-- 0/0              96 2006-07-24 01:21 .git/sharedindex.772b543d373251f25f394435e8ff6a0ad429b5c1
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/logs
 -rw-r--r-- 0/0             156 2006-07-24 01:21 .git/logs/HEAD
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/logs/refs
diff --git a/gix-index/tests/fixtures/generated-archives/v2_split_index_recursive.tar b/gix-index/tests/fixtures/generated-archives/v2_split_index_recursive.tar
index d7b836330..80745dfa2 100644
--- a/gix-index/tests/fixtures/generated-archives/v2_split_index_recursive.tar
+++ b/gix-index/tests/fixtures/generated-archives/v2_split_index_recursive.tar
@@ -1,8 +1,8 @@
 drwxr-xr-x 0/0               0 2006-07-24 01:21 ./
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git
 -rw-r--r-- 0/0              73 2006-07-24 01:21 .git/description
+-rw-r--r-- 0/0             205 2006-07-24 01:21 .git/sharedindex.f3e525d52de36e08a116a91a59b84ae2512073d3
 -rw-r--r-- 0/0             113 2006-07-24 01:21 .git/config
--rw-r--r-- 0/0             205 2006-07-24 01:21 .git/sharedindex.b03111262ebac0866ea3bae2ae5987cd2abdcaef
 -rw-r--r-- 0/0              21 2006-07-24 01:21 .git/HEAD
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/info
 -rw-r--r-- 0/0             240 2006-07-24 01:21 .git/info/exclude
diff --git a/gix-index/tests/fixtures/generated-archives/v2_split_vs_regular_index.tar b/gix-index/tests/fixtures/generated-archives/v2_split_vs_regular_index.tar
index 78c144b13..25f67056d 100644
--- a/gix-index/tests/fixtures/generated-archives/v2_split_vs_regular_index.tar
+++ b/gix-index/tests/fixtures/generated-archives/v2_split_vs_regular_index.tar
@@ -72,7 +72,6 @@ drwxr-xr-x 0/0               0 2006-07-24 01:21 split
 -rw-r--r-- 0/0               2 2006-07-24 01:21 split/z
 drwxr-xr-x 0/0               0 2006-07-24 01:21 split/.git
 -rw-r--r-- 0/0              73 2006-07-24 01:21 split/.git/description
--rw-r--r-- 0/0             416 2006-07-24 01:21 split/.git/sharedindex.1183b4544429fd6795bd9c3e1babd628a19bf369
 -rw-r--r-- 0/0             129 2006-07-24 01:21 split/.git/config
 -rw-r--r-- 0/0              21 2006-07-24 01:21 split/.git/HEAD
 drwxr-xr-x 0/0               0 2006-07-24 01:21 split/.git/info
@@ -92,6 +91,7 @@ drwxr-xr-x 0/0               0 2006-07-24 01:21 split/.git/hooks
 -rwxr-xr-x 0/0            1492 2006-07-24 01:21 split/.git/hooks/prepare-commit-msg.sample
 -rwxr-xr-x 0/0            1643 2006-07-24 01:21 split/.git/hooks/pre-commit.sample
 -rwxr-xr-x 0/0            1374 2006-07-24 01:21 split/.git/hooks/pre-push.sample
+-rw-r--r-- 0/0             416 2006-07-24 01:21 split/.git/sharedindex.b65ac769815289980f011f9ae3406be2c8fa8407
 drwxr-xr-x 0/0               0 2006-07-24 01:21 split/.git/logs
 -rw-r--r-- 0/0             305 2006-07-24 01:21 split/.git/logs/HEAD
 drwxr-xr-x 0/0               0 2006-07-24 01:21 split/.git/logs/refs
diff --git a/gix-negotiate/tests/fixtures/generated-archives/make_repos.tar b/gix-negotiate/tests/fixtures/generated-archives/make_repos.tar
index f4b7cc55f..021b86233 100644
--- a/gix-negotiate/tests/fixtures/generated-archives/make_repos.tar
+++ b/gix-negotiate/tests/fixtures/generated-archives/make_repos.tar
@@ -758,9 +758,9 @@ drwxr-xr-x 0/0               0 2006-07-24 01:21 multi_round/client/.git/objects/
 -rw-r--r-- 0/0           10712 2006-07-24 01:21 multi_round/client/.git/objects/info/commit-graph
 -rw-r--r-- 0/0              54 2006-07-24 01:21 multi_round/client/.git/objects/info/packs
 drwxr-xr-x 0/0               0 2006-07-24 01:21 multi_round/client/.git/objects/pack
--rw-r--r-- 0/0            1972 2006-07-24 01:21 multi_round/client/.git/objects/pack/pack-fe61307136aecbdd8c8dfe398d26986291921283.rev
--rw-r--r-- 0/0           59451 2006-07-24 01:21 multi_round/client/.git/objects/pack/pack-fe61307136aecbdd8c8dfe398d26986291921283.pack
--rw-r--r-- 0/0           14512 2006-07-24 01:21 multi_round/client/.git/objects/pack/pack-fe61307136aecbdd8c8dfe398d26986291921283.idx
+-rw-r--r-- 0/0            1972 2006-07-24 01:21 multi_round/client/.git/objects/pack/pack-0d92e2ea6533a6622cdab3bd74bccdce3d85f34e.rev
+-rw-r--r-- 0/0           14512 2006-07-24 01:21 multi_round/client/.git/objects/pack/pack-0d92e2ea6533a6622cdab3bd74bccdce3d85f34e.idx
+-rw-r--r-- 0/0           59340 2006-07-24 01:21 multi_round/client/.git/objects/pack/pack-0d92e2ea6533a6622cdab3bd74bccdce3d85f34e.pack
 -rw-r--r-- 0/0               7 2006-07-24 01:21 multi_round/client/b8.c12.t
 -rw-r--r-- 0/0               6 2006-07-24 01:21 multi_round/client/b8.c3.t
 -rw-r--r-- 0/0               7 2006-07-24 01:21 multi_round/client/b8.c17.t
diff --git a/gix-odb/tests/fixtures/generated-archives/make_repo_multi_index.tar b/gix-odb/tests/fixtures/generated-archives/make_repo_multi_index.tar
index 92d7ef83a..332922625 100644
--- a/gix-odb/tests/fixtures/generated-archives/make_repo_multi_index.tar
+++ b/gix-odb/tests/fixtures/generated-archives/make_repo_multi_index.tar
@@ -691,42 +691,42 @@ drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/objects/cf
 -rw-r--r-- 0/0              24 2006-07-24 01:21 .git/objects/cf/476b3404986ec61542145808a5c404d2a42789
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/objects/pack
 -rw-r--r-- 0/0            4669 2006-07-24 01:21 .git/objects/pack/pack-2bbdc0e28b43868ee99e4be43a1792a40b4e9129.pack
+-rw-r--r-- 0/0            3368 2006-07-24 01:21 .git/objects/pack/pack-1ad50de26d8845bd6bb80a060f7d7c3f00505fb7.idx
 -rw-r--r-- 0/0           26060 2006-07-24 01:21 .git/objects/pack/multi-pack-index
 -rw-r--r-- 0/0             264 2006-07-24 01:21 .git/objects/pack/pack-9b7062fbd0582882cab3454d7b3de09f1578bfa8.rev
 -rw-r--r-- 0/0            1828 2006-07-24 01:21 .git/objects/pack/pack-8c07ea08d9a9a4be5bca3e7fc246893243950f6d.idx
 -rw-r--r-- 0/0             136 2006-07-24 01:21 .git/objects/pack/pack-2aa9aa8c891b01a5dfd811c9ad9e1381dd469e39.rev
 -rw-r--r-- 0/0            3172 2006-07-24 01:21 .git/objects/pack/pack-572114382e4d12da25a0587fbc715315ece056bb.idx
--rw-r--r-- 0/0            3900 2006-07-24 01:21 .git/objects/pack/pack-e3a08db09c6b3290bca8c291d47e2810d6075392.idx
 -rw-r--r-- 0/0             236 2006-07-24 01:21 .git/objects/pack/pack-f8777e0a8830020ac1c2e49d399356bbb0f4d48b.rev
+-rw-r--r-- 0/0            3536 2006-07-24 01:21 .git/objects/pack/pack-1153fd15cfceb0c7f356c0d63111b0654c495a8b.idx
 -rw-r--r-- 0/0            2556 2006-07-24 01:21 .git/objects/pack/pack-9b7062fbd0582882cab3454d7b3de09f1578bfa8.idx
 -rw-r--r-- 0/0            2221 2006-07-24 01:21 .git/objects/pack/pack-5bc9b6bf1bf64398bfb3e435518e93d3e145382e.pack
 -rw-r--r-- 0/0            3854 2006-07-24 01:21 .git/objects/pack/pack-9b7062fbd0582882cab3454d7b3de09f1578bfa8.pack
 -rw-r--r-- 0/0             428 2006-07-24 01:21 .git/objects/pack/pack-9c7e898ea061ab4af575def7f9d0e6836d727871.rev
 -rw-r--r-- 0/0            1238 2006-07-24 01:21 .git/objects/pack/pack-2aa9aa8c891b01a5dfd811c9ad9e1381dd469e39.pack
 -rw-r--r-- 0/0            4931 2006-07-24 01:21 .git/objects/pack/pack-dd8f4f04f6c665f86f8a954204f89a3807fc890f.pack
--rw-r--r-- 0/0            7087 2006-07-24 01:21 .git/objects/pack/pack-818fb3b664fab9435175f66012273ef903ea893c.pack
--rw-r--r-- 0/0             456 2006-07-24 01:21 .git/objects/pack/pack-e3a08db09c6b3290bca8c291d47e2810d6075392.rev
--rw-r--r-- 0/0             380 2006-07-24 01:21 .git/objects/pack/pack-818fb3b664fab9435175f66012273ef903ea893c.rev
+-rw-r--r-- 0/0            7457 2006-07-24 01:21 .git/objects/pack/pack-1153fd15cfceb0c7f356c0d63111b0654c495a8b.pack
+-rw-r--r-- 0/0            7198 2006-07-24 01:21 .git/objects/pack/pack-1ad50de26d8845bd6bb80a060f7d7c3f00505fb7.pack
 -rw-r--r-- 0/0             304 2006-07-24 01:21 .git/objects/pack/pack-2bbdc0e28b43868ee99e4be43a1792a40b4e9129.rev
+-rw-r--r-- 0/0             380 2006-07-24 01:21 .git/objects/pack/pack-1ad50de26d8845bd6bb80a060f7d7c3f00505fb7.rev
 -rw-r--r-- 0/0             352 2006-07-24 01:21 .git/objects/pack/pack-572114382e4d12da25a0587fbc715315ece056bb.rev
 -rw-r--r-- 0/0             276 2006-07-24 01:21 .git/objects/pack/pack-8b66663f0e277f430c0574bca2ad2d27af318b0b.rev
 -rw-r--r-- 0/0            2024 2006-07-24 01:21 .git/objects/pack/pack-5bc9b6bf1bf64398bfb3e435518e93d3e145382e.idx
 -rw-r--r-- 0/0            1660 2006-07-24 01:21 .git/objects/pack/pack-2aa9aa8c891b01a5dfd811c9ad9e1381dd469e39.idx
--rw-r--r-- 0/0           10084 2006-07-24 01:21 .git/objects/pack/pack-e3a08db09c6b3290bca8c291d47e2810d6075392.pack
--rw-r--r-- 0/0            7550 2006-07-24 01:21 .git/objects/pack/pack-1e0a20554ca41743e224be67a9f273e253f5b461.pack
--rw-r--r-- 0/0            3536 2006-07-24 01:21 .git/objects/pack/pack-1e0a20554ca41743e224be67a9f273e253f5b461.idx
 -rw-r--r-- 0/0            2640 2006-07-24 01:21 .git/objects/pack/pack-8b66663f0e277f430c0574bca2ad2d27af318b0b.idx
 -rw-r--r-- 0/0            1608 2006-07-24 01:21 .git/objects/pack/pack-8c07ea08d9a9a4be5bca3e7fc246893243950f6d.pack
 -rw-r--r-- 0/0            2360 2006-07-24 01:21 .git/objects/pack/pack-f8777e0a8830020ac1c2e49d399356bbb0f4d48b.idx
+-rw-r--r-- 0/0             404 2006-07-24 01:21 .git/objects/pack/pack-1153fd15cfceb0c7f356c0d63111b0654c495a8b.rev
 -rw-r--r-- 0/0            3004 2006-07-24 01:21 .git/objects/pack/pack-dd8f4f04f6c665f86f8a954204f89a3807fc890f.idx
--rw-r--r-- 0/0            3368 2006-07-24 01:21 .git/objects/pack/pack-818fb3b664fab9435175f66012273ef903ea893c.idx
 -rw-r--r-- 0/0            2795 2006-07-24 01:21 .git/objects/pack/pack-f8777e0a8830020ac1c2e49d399356bbb0f4d48b.pack
 -rw-r--r-- 0/0            1492 2006-07-24 01:21 .git/objects/pack/pack-b736b4d7459841f7c6d1181c18c9685a2b1e75c2.idx
 -rw-r--r-- 0/0            3704 2006-07-24 01:21 .git/objects/pack/pack-9c7e898ea061ab4af575def7f9d0e6836d727871.idx
 -rw-r--r-- 0/0            2836 2006-07-24 01:21 .git/objects/pack/pack-2bbdc0e28b43868ee99e4be43a1792a40b4e9129.idx
 -rw-r--r-- 0/0             112 2006-07-24 01:21 .git/objects/pack/pack-b736b4d7459841f7c6d1181c18c9685a2b1e75c2.rev
+-rw-r--r-- 0/0             456 2006-07-24 01:21 .git/objects/pack/pack-265d5a5a1b49208356d72008af5870937f764bef.rev
 -rw-r--r-- 0/0             160 2006-07-24 01:21 .git/objects/pack/pack-8c07ea08d9a9a4be5bca3e7fc246893243950f6d.rev
 -rw-r--r-- 0/0            2906 2006-07-24 01:21 .git/objects/pack/pack-8b66663f0e277f430c0574bca2ad2d27af318b0b.pack
+-rw-r--r-- 0/0            3900 2006-07-24 01:21 .git/objects/pack/pack-265d5a5a1b49208356d72008af5870937f764bef.idx
 -rw-r--r-- 0/0             212 2006-07-24 01:21 .git/objects/pack/pack-202904d12f68b77afa1da3d9708144023d6a999c.rev
 -rw-r--r-- 0/0            2192 2006-07-24 01:21 .git/objects/pack/pack-202904d12f68b77afa1da3d9708144023d6a999c.idx
 -rw-r--r-- 0/0            2518 2006-07-24 01:21 .git/objects/pack/pack-202904d12f68b77afa1da3d9708144023d6a999c.pack
@@ -734,8 +734,8 @@ drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/objects/pack
 -rw-r--r-- 0/0             188 2006-07-24 01:21 .git/objects/pack/pack-5bc9b6bf1bf64398bfb3e435518e93d3e145382e.rev
 -rw-r--r-- 0/0             328 2006-07-24 01:21 .git/objects/pack/pack-dd8f4f04f6c665f86f8a954204f89a3807fc890f.rev
 -rw-r--r-- 0/0            5214 2006-07-24 01:21 .git/objects/pack/pack-572114382e4d12da25a0587fbc715315ece056bb.pack
--rw-r--r-- 0/0             404 2006-07-24 01:21 .git/objects/pack/pack-1e0a20554ca41743e224be67a9f273e253f5b461.rev
 -rw-r--r-- 0/0             800 2006-07-24 01:21 .git/objects/pack/pack-b736b4d7459841f7c6d1181c18c9685a2b1e75c2.pack
+-rw-r--r-- 0/0            9951 2006-07-24 01:21 .git/objects/pack/pack-265d5a5a1b49208356d72008af5870937f764bef.pack
 drwxr-xr-x 0/0               0 2006-07-24 01:21 .git/objects/c8
 -rw-r--r-- 0/0              38 2006-07-24 01:21 .git/objects/c8/cc550204e043706fa1618ef7fb9c450f04d22b
 -rw-r--r-- 0/0             146 2006-07-24 01:21 .git/objects/c8/553151f7776f597a3a3fe80d070aa3c8c5d689

In some cases, hashes shown in pack filenames differ. In others, filenames remain the same yet are for some reason listed in a different order. Is some of this related to the effect of running tests concurrently? (With cargo nextest, they are either mostly or entirely run in different processes rather than in different threads of the same process, so I expect that multiple fixture scripts actually can run at the same time.)

@Byron
Copy link
Member

Byron commented Sep 13, 2024

Thanks for sharing - I think I also noticed that the archives could change quite a bit and appreciate looking into 'stabilizing' archive generation.

So one question I have is what changes, if any, should be committed to .gitattributes to make it easier to investigate this.

Please feel free to add these proposed changes to .gitattributes, maybe even with some garnish of comments to share which git-configuration would be needed to make it effective.

In some cases, hashes shown in pack filenames differ. In others, filenames remain the same yet are for some reason listed in a different order. Is some of this related to the effect of running tests concurrently?

I didn't notice that they just moved while filenames stayed the same, but also find it strange that the hashes differ. That would mean the objects differ, and those should always be frozen to a specific date. I also don't know of any file format that would contain the time (or any other variable), so formats should be stable if the content is stable.

v2_split_index.tar seems like a good candidate to investigate as only one file seemingly changed. Another reason for changes might be an unsorted traversal over the contents of directories when creating the tar archive, maybe that's something that can be changed as well to get stable archives.

(With cargo nextest, they are either mostly or entirely run in different processes rather than in different threads of the same process, so I expect that multiple fixture scripts actually can run at the same time.)

Multiple different fixture scripts can run at the same time, as each holds a file-based lock to assure exclusive access to the target directory. This is to assure that only the first thread or process will create the result of a read-only script, while all that follow wait and just use the result. Writeable scripts don't perform locking as their target directory is assume to be exclusive already.

@EliahKagan
Copy link
Member Author

EliahKagan commented Sep 14, 2024

v2_split_index.tar seems like a good candidate to investigate as only one file seemingly changed

Here's some preliminary information about that, using gin on the extracted sharedindex.* files:

ek@walpurti:~$ gin sharedindex.772b543d373251f25f394435e8ff6a0ad429b5c1
[header]
  signature = DIRC
  version = 2
  entries = 1

[entry]
  entry = 1
  ctime = 1726081166.7460086
  mtime = 1726081166.7460086
  dev = 2049
  ino = 1356183
  mode = 100644
  uid = 1000
  gid = 1000
  size = 0
  sha1 = e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
  flags = 1
  assume-valid = False
  extended = False
  stage = (False, False)
  name = a

[checksum]
  checksum = True
  sha1 = 772b543d373251f25f394435e8ff6a0ad429b5c1
ek@walpurti:~$ gin sharedindex.a7c54197d242f14f5c246cda5be747aee80393b8
[header]
  signature = DIRC
  version = 2
  entries = 1

[entry]
  entry = 1
  ctime = 1726082332.4457066
  mtime = 1726082332.4457066
  dev = 2049
  ino = 1393893
  mode = 100644
  uid = 1000
  gid = 1000
  size = 0
  sha1 = e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
  flags = 1
  assume-valid = False
  extended = False
  stage = (False, False)
  name = a

[checksum]
  checksum = True
  sha1 = a7c54197d242f14f5c246cda5be747aee80393b8
ek@walpurti:~$ git diff --no-index <(gin sharedindex.772b543d373251f25f394435e8ff6a0ad429b5c1) <(gin sharedindex.a7c54197d242f14f5c246cda5be747aee80393b8)
diff --git a/dev/fd/63 b/dev/fd/62
--- a/dev/fd/63
+++ b/dev/fd/62
@@ -5,10 +5,10 @@

 [entry]
   entry = 1
-  ctime = 1726081166.7460086
-  mtime = 1726081166.7460086
+  ctime = 1726082332.4457066
+  mtime = 1726082332.4457066
   dev = 2049
-  ino = 1356183
+  ino = 1393893
   mode = 100644
   uid = 1000
   gid = 1000
@@ -22,4 +22,4 @@

 [checksum]
   checksum = True
-  sha1 = 772b543d373251f25f394435e8ff6a0ad429b5c1
+  sha1 = a7c54197d242f14f5c246cda5be747aee80393b8

So it looks like changes seem to be, or at least include, metadata on the actual filesystem that the V2 index files maintain for the purpose of speeding up change detection.

Because these are filesystem metadata (some of which--at least the inode numbers--Git does not even represent in repositories), it makes sense to me that it efforts to ensure configuration and metadata pertaining to Git are the same every time a fixture script is run would be insufficient to ensure these filesystem metadata are the same.

Multiple different fixture scripts can run at the same time, as each holds a file-based lock to assure exclusive access to the target directory. This is to assure that only the first thread or process will create the result of a read-only script, while all that follow wait and just use the result. Writeable scripts don't perform locking as their target directory is assume to be exclusive already.

Yes, this makes sense, and I was not really thinking clearly in suggesting that concurrency may have had to do with the differences.

@Byron
Copy link
Member

Byron commented Sep 14, 2024

Thanks for looking into this! With the index changes, I could have known 😅. Indeed there is nothing that can easily be done. For a moment I thought one could rewrite the index to unify these value, but I don't think it's worth the effort.

What's left for me to understand is how packs can be different. Timestamps and author names are controlled, so all objects should always be the same. Maybe it's not the content of the packs that change, but only their checksum which is made over compressed data as well. And the compressor may always be a little different for many reasons, particularly across systems.

Assuming the above is correct, I think this explains the differences, and would clearly put them into the 'something to accept' category. Maybe that's OK as well?

@EliahKagan
Copy link
Member Author

Thanks for looking into this! With the index changes, I could have known 😅. Indeed there is nothing that can easily be done. For a moment I thought one could rewrite the index to unify these value, but I don't think it's worth the effort.

Assuming the above is correct, I think this explains the differences, and would clearly put them into the 'something to accept' category. Maybe that's OK as well?

I agree that modifying the index files once they are created is not worthwhile. If this were done, then when investigating future strange things that turn out to be unrelated, one would occasionally wonder if the modifications were themselves somehow wrong, even in the absence of any actual bugs in how the modifications were made. In contrast, keeping the files are they were really created by git makes it so there will be fewer things to investigate. Keeping them "honest" also has the benefit that they can be used more readily (and with less explanation) when reporting bugs outside gitoxide, such as when reporting bugs in Git.

Keeping things as they are seems okay to me, but there are three other options:

  1. I don't recommend this, but we can probably cause the device and inode numbers to actually be the same: the inode numbers could at least in principle be made the same by using a carefully constructed squashfs filesystem, and the inode number for that filesystem may be possible to make the same even when regenerating archives on different systems by using containerization and making sure it is always mapped the same way. I am not actually 100% confident that this will reliably work, but the bigger issue is that it is incurs substantial effort, and substantial limitation on the environment in which archives are regenerated to be committed.
  2. Ensure archives always in practice differ, but in a way that does not make diffs hard to read, so that non-intentional differences can still be spotted when desired. So timestamps, at least for files that any tests potentially use (i.e. files currently in the archives), should be kept the same. But a new file could be added to each archive with a current timestamp and/or with a random value as its content. By making them always differ, the technique introduced in this PR would always be effective at detecting when a currently committed generated archives will not be used.
  3. Keep archive generation the same, but refine or replace the technique introduced in this PR so it more directly checks--or includes a more direct check--what archives are actually generated. The best way to do that might be to take a wholly different approach from what I did here, and add functionality to gix-testtools to quickly check if any archives, other than those that are .gitigored, would need to be generated. Another approach could be to do this outside gix-testtools by checking for new, changed, or removed fixture scripts in the absence of what appears to be a corresponding change to generated archives in the repository.

What's left for me to understand is how packs can be different. Timestamps and author names are controlled, so all objects should always be the same. Maybe it's not the content of the packs that change, but only their checksum which is made over compressed data as well. And the compressor may always be a little different for many reasons, particularly across systems.

make_repos.tar was the archive in which one pack and its associated files, but nothing else, was changed, so I looked into that a bit, so far only examining the .pack files. Details are in this gist.

I don't know what's going on here. Cursory examination of the actual repositories suggests that the objects are the same, though I have not really verified that this is the case. But I think the mystery can be solved by examining the changes in the output of git verify-pack -v on the .pack files, possibly even without checking the .idx and .rev files. I did that in the gist, and also included a diff of the output, though the diff would be more useful if I had made it more granular.

Starting in the middle of the output at the first changed line, here are the first 10 lines before regenerating the archives (i.e., actually after regenerating them the first time but not the second, per the procedure described above):

84f9c8767d96afe6e080271e613e51be5bd829ac tree   566 434 36051
9e88bc3f090bb37f8890abdc29f4375e0d806760 tree   12 23 36485 1 0a10084d765d90854e2bc548df400f23812680cf
71f30135b6e3ada2575c01919d3ebccfdaea79da tree   12 23 36508 1 a6db8c57d2358a97a00db2102926402ac0504751
f41b7b3c34a99d01ea840e13f9e82e1b7a962161 tree   12 23 36531 1 dd2143845727fb6adc72f7b053b9093ca9c05160
748c524eb71e57e3b4d105ba29db24e1fefd335c tree   670 515 36554
e09a018c615016f37b31421e52e0929f613f25bb tree   635 487 37069
ce8a2638c9cc62255b55fde7b2a1533fffaa4cc4 tree   600 463 37556
4b410c35e9ea1c4b03c28886aae751726acc420d tree   565 436 38019
ee139939d2141fa9b5abf467bbddab5c53882244 tree   530 409 38455
4a735977214a2df6acac52ca5ae1bc9dd12db700 tree   12 23 38864 1 0a10084d765d90854e2bc548df400f23812680cf

And here are the first 10 lines after regenerating the archives (i.e., after re-regenerating them):

84f9c8767d96afe6e080271e613e51be5bd829ac tree   12 23 36051 1 c3adda84e145206e063dadb5702cc84e42068d71
9e88bc3f090bb37f8890abdc29f4375e0d806760 tree   12 23 36074 1 0a10084d765d90854e2bc548df400f23812680cf
71f30135b6e3ada2575c01919d3ebccfdaea79da tree   12 23 36097 1 a6db8c57d2358a97a00db2102926402ac0504751
f41b7b3c34a99d01ea840e13f9e82e1b7a962161 tree   12 23 36120 1 dd2143845727fb6adc72f7b053b9093ca9c05160
748c524eb71e57e3b4d105ba29db24e1fefd335c tree   670 515 36143
e09a018c615016f37b31421e52e0929f613f25bb tree   635 487 36658
ce8a2638c9cc62255b55fde7b2a1533fffaa4cc4 tree   600 463 37145
4b410c35e9ea1c4b03c28886aae751726acc420d tree   12 23 37608 1 879c3581b71bdbae7cbf69718960ab1d478c769e
ee139939d2141fa9b5abf467bbddab5c53882244 tree   12 23 37631 1 c3adda84e145206e063dadb5702cc84e42068d71
4a735977214a2df6acac52ca5ae1bc9dd12db700 tree   12 23 37654 1 0a10084d765d90854e2bc548df400f23812680cf

Simplifying even further by looking at just the first changed line:

-84f9c8767d96afe6e080271e613e51be5bd829ac tree   566 434 36051
+84f9c8767d96afe6e080271e613e51be5bd829ac tree   12 23 36051 1 c3adda84e145206e063dadb5702cc84e42068d71

The output of git ls-tree 84f9c8767d96afe6e080271e613e51be5bd829ac is the same in both repositories.

Details of packs are a weak area for me in Git, and I suspect that the situation may well be obvious. I can look into this further, and I'm willing to do so, but I figured I'd post that now in case the cause can already be inferred.

Is the difference due to different ways of computing deltas that, when resolved, give the same objects? That is, do the packs have the same objects, differently deltafied?

@Byron
Copy link
Member

Byron commented Sep 24, 2024

Thanks for sharing additional options!

For now I also feel to just keep it the way it is, but gain a better understanding of how scripts can be made (more) idempotent.

Is the difference due to different ways of computing deltas that, when resolved, give the same objects? That is, do the packs have the same objects, differently deltafied?

Yes, that would be my guess as well. Besides that, I am surprised that this operation isn't deterministic, and something we will have to live with.

There is a way to disable deltification for the entire blob-content of a repository using * delta in .gitattributes, but there is still trees and commits to deal with. Maybe there are other ways to discourage deltification in git gc invocations.

@EliahKagan
Copy link
Member Author

EliahKagan commented Sep 24, 2024

Since it is possible to have a regression where gitoxide does not properly resolve deltas, I'm not sure if we should try to avoid deltafication in repositories created by fixture scripts. I wonder, though, if running git gc --aggressive at the end of fixture scripts would be sufficient to make it so deltafication is done the same almost every time. From git-gc(1):

When the --aggressive option is supplied, git-repack[1] will be invoked with the -f flag, which in turn will pass --no-reuse-delta to git-pack-objects[1]. This will throw away any existing deltas and re-compute them, at the expense of spending much more time on the repacking.

But even aside from how this would slow down the fixture scripts, I am not at all sure this would be worthwhile.

@Byron
Copy link
Member

Byron commented Sep 25, 2024

That's a good point, some fixtures in gix-pack are likely to rely on delta-compression.

But even aside from how this would slow down the fixture scripts, I am not at all sure this would be worthwhile.

If it would make the output of scripts more stable in terms of the files it produces, I'd like it, particularly if these are used read-only. After all, these scripts are rarely big or have much actual content, so compression will be really fast no matter what.

If you think it's not worth it, I'd go with that - after all, delta-compression is just a part of the problem which by itself isn't reasonably solvable due to the reliance on the filesystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants