-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow operations on very large repos #1841
Comments
What's the timing with |
When you run a in colocated repo (created via |
Given that there's "hundreds of thousands of tags" (dont ask, dont ask, dont ask) it seems as something that absolutely affects the status :) So yeah, can't wait to know what it is with |
|
Okay, so this is interesting:
The The underlying performance seems to be pretty consistent across these:
SUUUUPER fast. Hyperfine says 263.8±4.1ms. Also, zero difference on that between colocated and non-colocated.
👍🏼
Weirdly, Happy to keep providing details/etc.! |
So, a fair amount of time would be spent for importing refs.
No idea what happened for the first run. The current (unreleased version of) |
In this case, every tag points expressly and specifically to a commit, because every commit on the trunk branch is tagged.1 So it should help, in theory… but in practice it does not seem to have done so. 🤔 Footnotes
|
Perhaps we could also point Watchman at the Git ref files/directories, so that we could at least skip importing refs when none of them have changed (or something more ambitious where we import refs selectively based on which files have changed). |
Maybe we can also save & compare |
I have a similar issue with the Fuchsia repository.
During this command, the snapshotting output shows it's spending a lot of time in If I delete
Ignoring the working copy helps a lot:
Using the working tree method:
That's not horrible, but you're left without a The help text for
I'm unclear how the snapshot can go stale if I don't mind aliasing NOTE: I am using a GCP Virtual Desktop and so the backing networked SSD isn't all that fast. The issue may be lessened if working with local nvme. What to do? :) |
PS I installed |
Well Since any command snapshots the WC beforehand it never sees it being stale, but something like If you understood my comment better than the help text feel free to suggest better wording :) edit: ugh I see now that your question was related to watchman and you probably did knew everything I re-explained here :| |
Whether you use Make sure that you set |
So, to be clear, Watchman doesn't cause snapshots to be taken when it notices that something changed, correct? It only keeps track of the changed files to tell
Good to know! I wish this was one of the |
Excellent, I think I needed
|
I will give that a try on the work machine tomorrow—I'd love to be able to start using it there, because after the last month Git feels janky as heck every time I use it. 😂 |
I posted in Discord https://discord.com/channels/968932220549103686/969291218347524238/1129516951706816532 but should post here as well: Here's a Interesting segments:
@martinvonz is working on tree-level conflicts which should take care of the biggest bottleneck. I think we can cut ~90ms if we stop storing file states in the tree-state proto for the Watchman case. With some additional feature work, we could possibly reduce |
Related: snapshotting adds significant overhead for
This is just using the v0.8.0 mainline, no |
@ilyagr That's correct. One way is to launch a daemon and use a Watchman subscription: https://facebook.github.io/watchman/docs/cmd/subscribe. Actually, it seems that Watchman has a
TIL hyperfine accepts multiple commands to benchmark 🤣. It's worth noting that |
Encountered this and can confirm that with:
Both the |
The work on changing how conflicts are stored is now pretty much done. You can set |
With #2232 merged, you should see significantly better performance in fresh clones of large repos. For example, I timed |
@martinvonz is it recommended to fresh re-clone a large repo? |
I think that depends on how often you want to look at old commits. New commits will use the new format once you've set |
Does this mean watchman is no longer required for meaningful work on large
repos?
…On Fri, Sep 8, 2023 at 10:25 AM Martin von Zweigbergk < ***@***.***> wrote:
I think that depends on how often you want to look at old commits. New
commits will use the new format once you've set format.tree-level-conflicts
= true, but you'll need to re-clone (with a version built after #2232
<#2232>) to get the speedup on
commits that are already in the git repo.
—
Reply to this email directly, view it on GitHub
<#1841 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAKVPD34P6QGXPIXNPPLH3XZNIHHANCNFSM6AAAAAA2FBNHKU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
No, it doesn't mean that. Watchman helps with snapshotting the working copy by keeping track of which files have changed between two consecutive snapshots. The tree-level conflicts makes it faster to determine which paths have conflicts (and, importantly, it makes it faster to determine when there are no conflicts). |
I ran into a bug yesterday that's most likely caused by tree-level conflicts. I resolved a conflict in one commit and squashed the resolution into it. There were still some descendant commits that were shown as conflicted in |
Is it normal to see
And it almost feels the fsmonitor doesn't help much in either call, it's not very slow but there is a noticeable delay. |
I think that's just saying that we're initializing the connection to watchman. The process is still running between calls, right? |
FWIW I only see one watchman process that is persisting across invokations.
…On Wed, Sep 13, 2023 at 10:50 AM Martin von Zweigbergk < ***@***.***> wrote:
I *think* that's just saying that we're initializing the *connection* to
watchman. The process is still running between calls, right?
—
Reply to this email directly, view it on GitHub
<#1841 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAKVPCEEKNKRSVZLHC6PZDX2HXABANCNFSM6AAAAAA2FBNHKU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I tried uninstall watchman and time the
And then I install watchman and make sure it's queried
The |
If you're curious what's taking time, you can try profiling using e.g. samply. Just install with |
Wow, this tool is pretty cool. I was trying And here is the samply profile: https://share.firefox.dev/3LpIbYO Both of them seems to point to the |
Ah, that confirms one of my suspicions - that importing refs from git takes a lot of time. When you're in a colocated repo, every |
I see. That makes sense. The reason I need colocate repo is because some of our team's scripts makes assumption on git, like |
If you have tons of refs under |
As suggested by @yuja in jj-vcs#1841 (comment)
As suggested by @yuja in jj-vcs#1841 (comment)
As suggested by @yuja in jj-vcs#1841 (comment)
As suggested by @yuja in jj-vcs#1841 (comment) Thanks to @lazywei for pointing out that `git pack-refs --all` is better, at least on the first run. I haven't checked, but suspect, that because of the number of `refs/jj` refs jj creates, it might always be better.
As suggested by @yuja in jj-vcs#1841 (comment) Thanks to @lazywei for pointing out that `git pack-refs --all` is better, at least on the first run. I haven't checked, but suspect, that because of the number of `refs/jj` refs jj creates, it might always be better.
As suggested by @yuja in jj-vcs#1841 (comment) Thanks to @lazywei for pointing out that `git pack-refs --all` is better, at least on the first run. I haven't checked, but suspect, that because of the number of `refs/jj` refs jj creates, it might always be better.
As suggested by @yuja in #1841 (comment) Thanks to @lazywei for pointing out that `git pack-refs --all` is better, at least on the first run. I haven't checked, but suspect, that because of the number of `refs/jj` refs jj creates, it might always be better.
As suggested by @yuja in jj-vcs#1841 (comment) Thanks to @lazywei for pointing out that `git pack-refs --all` is better, at least on the first run. I haven't checked, but suspect, that because of the number of `refs/jj` refs jj creates, it might always be better.
Another regularly scheduled performance update: I am testing an array of memory-related optimizations and changes in #2503 — in some cases, for large repositories, these changes will improve performance by up to 2x i.e. the same operation will complete with the same output, while using 50% of the original wall clock time. This includes operations like I am going to keep iterating on this branch as I don't expect it to go in immediately and I will keep testing new changes. The goal is for every change to go upstream, and to do it piecewise. So you should consider this a publicly available testing branch, not a traditional PR. Some changes may or may not improve performance (i.e. they may only improve observability), but the goal is for every change to result in a net ~0% runtime increase, at the minimum. If you aren't afraid to compile from source code, please give it a try and report back with the Commit ID of the EDIT: Something like this should get you going:
|
is it reasonable to expect that operations like i'm using
the repo i'm working with is pretty large, but i looked through the issues & haven't seen anyone specifically calling out for reference, after running $ git count-objects -vH
count: 0
size: 0 bytes
in-pack: 1444949
packs: 2
size-pack: 6.79 GiB
prune-packable: 0
garbage: 1
size-garbage: 355.00 MiB |
I also experience slow pushes on large repos (Nixpkgs). |
Perhaps profiling the push using the suggestions from https://github.com/martinvonz/jj/blob/main/docs/contributing.md#profiling might indicate something? I wonder if the safety checks from #3522 might need optimizing of some sort, but that's just because it's the last thing I know changed with |
Does |
a little: |
I wonder if it's the number of refs that's the problem. What does |
|
That's not very much so it's probably not the problem. Is regular |
|
Thanks for checking! Perhaps it's some performance bug in libgit2's push code then. I don't have any other ideas anyway. |
Last time I checked, |
Description
Right up front I want to acknowledge: (a) this is definitely an unusual situation, and (b) I totally get that it is likely to take a bit to sort through. But: I tried out Jujutsu on a very large repo from work a few minutes ago and found it's distinctly not yet ready to use there:
jj init --git-repo=.
jj status
(I'll add more operations to this list once I'm actually back at work in August!)
For scale: this repo has on the order of 3M LOC checked in—primarily JavaScript, TypeScript, and Handlebars, but with a mix of Java and Gradle as well, with a massive
node_modules
directory and a not-small bucket of things related to Gradle (bothgitignore
'd buuuut still massive) and it has hundreds of thousands of commits in its history, hundreds of active branches… and, annoyingly, also hundreds of thousands of tags (one for each commit; better not to ask).For comparison,
git status
takes a second or two (again, I will time them when I'm back at work). I'm not using a sparse checkout here (other folks sometimes do, but for various reasons it's a non-starter for me 😩).Comparable open source repos might be something like Firefox or Chrome? I tried DefinitelyTyped, and its 3M LOC and mere 84,275 commits only took 9s to initialize and
jj status
took around a second. Even so, the comparable scale of the codebase itself and dramatically better performance suggests there may be something repo-specific (the tags?) causing the issue.Steps to Reproduce the Problem
git
.jj
.Expected Behavior
It completes in a reasonable amount of time.
Actual Behavior
It completes in what honestly probably is a reasonable amount of time given the sheer scale of the things, but in a way that makes it much worse than Git for the moment.
Specifications
The text was updated successfully, but these errors were encountered: