Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: jj bisect, a general purpose bisection command. #2987

Open
PhilipMetzger opened this issue Feb 7, 2024 · 16 comments
Open

FR: jj bisect, a general purpose bisection command. #2987

PhilipMetzger opened this issue Feb 7, 2024 · 16 comments
Labels
enhancement New feature or request

Comments

@PhilipMetzger
Copy link
Contributor

PhilipMetzger commented Feb 7, 2024

Is your feature request related to a problem? Please describe.
We're currently missing a bisection command which is freestanding from #1869, so something like jj bisect.

Describe the solution you'd like

Add jj bisect with a option to integrate with #1869, e.g jj bisect --run '<command>'.

This was previously discussed in Discord about ~9 months ago and it was brought up that it was important for Google to have a separate command, as there are many commits in-flight there.

We also should look to integrate with scm-bisect.

@PhilipMetzger PhilipMetzger added the enhancement New feature or request label Feb 7, 2024
@martinvonz
Copy link
Member

For anyone thinking of working on this, note that we'll want to implement by adding a method on the Revset trait. That way it can be performant on very large repos without walking the entire revset. Oh, I just realized that another option is to make it a revset function. Then we could do something like bisect(<good commit>::<bad commit>, 4) to produce 4 points between the good commit and the bad commit. Note that normal bisection would pass 1 there (so the example with 4 is really a pentasection or whatever it might be called).

@arxanas
Copy link
Contributor

arxanas commented Feb 11, 2024

Some useful extensions for bisection:

  • Minimize the set of commits still satisfying the predicate.
    • This is particularly useful in the merge commit case where some non-obvious combination of commits from both branches causes the problem.
    • I was working on test-case minimization for scm-bisect but didn't publish anything yet.
  • Apply a commit/patch/fix/test script etc. before running the predicate.
    • Technically there's no reason you can't make this part of the test command, but it happens often enough that it might be nice to include as its own feature for convenience.

Oh, I just realized that another option is to make it a revset function. Then we could do something like bisect(::, 4) to produce 4 points between the good commit and the bad commit. Note that normal bisection would pass 1 there (so the example with 4 is really a pentasection or whatever it might be called).

In the latest scm-bisect, the search strategy trait requires only being able to bisect a single commit set/find its midpoint. Then the scm-bisect search code will automatically speculate / $n$-sect the rest of the search space as desired by recursively bisecting.

  • Compared to $n$-section, the recursive bisection in scm-bisect is not optimally-sized except when $n$ is a power of 2.
    • But asymptotically, it probably works out to be roughly the same — changing the base of the logarithm should be a constant factor difference?
    • Besides, who is out there setting $n = 17$ for their search space anyways? 😛
  • Even if we chose to do general $n$-section, it's not automatically "adaptive": the available concurrency may vary at runtime, and we may gain new information during search that would let us cancel some jobs.
    • If you pentasect the search space because you have four available workers, and two of them find a smaller section to search, the remaining two jobs may still be running and you may not actually have four workers immediately available to search the refined space. At that point, you'd want to at least direct the two available workers to search the newest space, and if more workers become available later, direct them to also search somewhere in that space, without cancelling the current worker jobs or repartitioning the search space.
    • The design implication is that $n$ may not be fixed during runtime: we should be able to handle having fewer workers or gaining more workers dynamically.
    • scm-bisect will let you speculate with an arbitrary number of workers simultaneously.
  • It turns out implementing $n$-section by one's self is surprisingly likely to be buggy. In a previous version of the search::Strategy trait, it was indeed responsible for $n$-section/speculation and it was 1) hard to implement and 2) buggy.
    • I used some property-based testing for scm-bisect, which caught one or two bugs. It would be best if we could isolate thorough testing to a small unit instead of putting it directly in jj — at least for as long as feasible.
    • I also tried formal verification with Kani and maybe one other tool, but they choked on the codebase.

So I think we should punt on "true" $n$-section indefinitely and stick to recursive bisection.

@CelestialCrafter
Copy link

any updates on this?

@martinvonz
Copy link
Member

martinvonz commented Nov 21, 2024

I don't think anyone is working on it. In case you or someone else is thinking of working on it, I'll mention that there are some open questions:

  • Should the bisection state be per workspace? I think it should.
  • Should bisection state be version controlled in the repo/operations? That would enable undo, which seems useful. I have often wanted to undo a hg bisect --good/--bad.
  • Should we allow bisecting over a filtered range of commits? If you know that the bug is in lib/ somewhere, it seems useful to be able to bisect files("lib/"), especially for large repos.
  • If we base the bisection state on a revset function, then when do we resolve that to commits? Let's say we've marked some commits as good and some as bad, the next commit to inspect would be bisect(<bad>..<good> & files("lib/")). I think that makes sense. My concern is if the user started bisection with something like v1.0..@ that can change over time (depending on where @ points in this case). Just trying to warn in some common cases seems good enough to me.
  • Should the bisect() function be allowed to return more than on commit? If you had determined that a fork point was good and a merge point was bad, it would make sense to test all branches between then concurrently (but it would be up to the jj bisect command whether to actually do that).
  • But asymptotically, it probably works out to be roughly the same — changing the base of the logarithm should be a constant factor difference?

Yes, I was thinking the user would just 4-sect for a 2x speedup (assuming enough resources to run 4 concurrent jobs) or 8-sect for a 3x speedup. More than that would probably usually not be worth it (and anything more than bisection might also often not be worth it).

So I think we should punt on "true" 𝑛-section indefinitely and stick to recursive bisection.

Sounds good.

@joyously
Copy link

Should bisection state be version controlled in the repo/operations?

Does this entail more than materializing different commits as the working copy?
Because you mentioned undo of marking a state good or bad, I'm wondering how it works in hg or git. Surely the state is saved, since bisection involves multiple commands.
This also made me wonder about viability of commits since there is auto-snapshot in jj. It would seem that not every commit could be tested (might not compile), so you need something besides "good" and "bad".

@martinvonz
Copy link
Member

Does this entail more than materializing different commits as the working copy?
Because you mentioned undo of marking a state good or bad

Correct, it would include good/bad state.

I'm wondering how it works in hg or git. Surely the state is saved, since bisection involves multiple commands.

Right, it's still stored somewhere in .hg/, .git/, but it's not version controlled, at least not in the case of hg. Maybe git uses reflogs or something. I haven't checked.

This also made me wonder about viability of commits since there is auto-snapshot in jj. It would seem that not every commit could be tested (might not compile), so you need something besides "good" and "bad".

That's true regardless. Both git and hg have an option to skip a commit if you can't determine if it's good or bad.

@arxanas
Copy link
Contributor

arxanas commented Nov 23, 2024

@CelestialCrafter: any updates on this?

If you or someone else wants to work on this, you can take a look at

Also, let me know and I'll push some recent work/changes to scm-bisect which might make the interface easier to work with (including primarily stuff on arxanas/git-branchless@master...arxanas/wip/minimize-3).

In the below comment, I've also discussed how git-branchless/scm-bisect handle things, which will hopefully be useful for a future implementer.


@martinvonz: Should the bisection state be per workspace? I think it should.

I think so as well.


@martinvonz: Should bisection state be version controlled in the repo/operations? That would enable undo, which seems useful. I have often wanted to undo a hg bisect --good/--bad.

  • Ultimately, I think this is the right long-term design.
    • We especially shouldn't put magic state in a random directory like Git does.
    • There's the obvious UI problem that needs to be solved where you would be modifying working copy state while also switching to other commits in that same working copy, and you need to reconcile those somehow. I have some ideas, but ideally we skip solving this problem for the first draft.
  • However, for an initial implementer, I would recommend that you implement in-memory state only with an automated command first (like git bisect run), and just wrap this in a helper interactive mode where the automated command is a user shell (like the equivalent of git bisect run bash).
    • This lets us defer a huge amount of complexity around the persistent state and UI, and work on the fundamental bisection design and implementation.
    • This is what git-branchless does. (There's no screenshots in those docs for that mode, but it will automatically check out the next commit and print an instruction block for the user telling them how to mark that commit as good/bad/skip/abort via exit N.)

@martinvonz: Should we allow bisecting over a filtered range of commits? If you know that the bug is in lib/ somewhere, it seems useful to be able to bisect files("lib/"), especially for large repos. [...] Just trying to warn in some common cases seems good enough to me.

I think so.

  • Why wouldn't we allow it? I can imagine some reasons you might have in mind, but you didn't specifically mention anything:
    • You might be imagining that it's hard to implement bisection over a non-linear range (such as with merge commits).
    • You might be imagining that it's hard to implement bisection over a non-contiguous range (such as with filtered-out commits).
    • You might think that filtering by certain conditions (such as file(...)) is unrealistic for bisection over ranges at Google scale.
  • You should get handling of non-linear, non-contiguous subgraphs "for free" when integrating with scm-bisect via BasicSourceControlGraph.
    • It should scale fine for practically all OSS users.
  • When integrating with scm-bisect with a custom search::Strategy implementation, you can do one of the following:
    • Let midpoint return filtered commits. Then, when processing a commit, check to see if it's filtered out, and, if so, inform scm-bisect that it was skipped. Then continue in the next iteration of the search loop and ask scm-bisect for the next commit to search, as normal.
    • Define midpoint in such a way that it never returns a filtered commit. I don't recommend this. It turned out to be incessantly buggy when I tried doing this in certain cases. I changed scm-bisect to do this for you in the above setup plus its speculation mechanism, which is a lot better test/validated than the ad-hoc midpoint definitions I had written before. (But it might be necessary at Google-scale for performance. I'm trying to change it so that it's not necessary even at those scales.)
  • IIRC hg bisect has different flags for "the commits to check" and "commits to filter", which seems like unfortunate UI design.
    • As a user, I don't really want to think about separating these two concepts to represent the one set of commits I want to search.
    • I would really try to avoid this and have a single flag to represent both, if possible.
    • git-branchless only accepts a single search set, as desired. (Possibly there are issues at larger scales?)

I'll also mention that there's two main user workflows for bisection/search:

  • Testing local commit stacks as part of normal development. In these cases, a subset of users can get a lot of value even if the bisection solution doesn't scale well.
  • Testing long ranges of historical commits to track down a regression. scm-bisect should scale to most OSS repos, but probably not Google in its current form. (The development work I mentioned previously help it perform at Google-scale.)

We don't necessarily need a fully-scaling solution to begin with (and hopefully the abstractions defined by scm-bisect would also allow Google to slot in a higher-performance bisection backend without having to make considerable code changes.)


@martinvonz: If we base the bisection state on a revset function, then when do we resolve that to commits?

Is there a reason not to resolve it immediately when the bisect is initiated?

  • It would be surprising to me if the revset I'm searching changed as I searched.
    • Honestly, my intuition is such that I don't think it would even warrant warning e.g. if I used @ in a bisection pattern. I would expect the early-evaluation semantics in all cases. Maybe others feel differently?
      • Fun fact: in git-branchless, the most recent result of running a test/search command on a commit can be accessed via the revset language, so you could theoretically encode almost-self-referential search sets like "the set of all commits for which the search command fails". It would be a lot easier to think about practically when you're only considering the last "snapshot" of times when the search command was run on those commits, rather than a set that can change as you're trying to search it.
    • Perhaps you're thinking of the lifetime of the search as part of a sequence of jj bisect command invocations, in which you're imagining that the revset would need to be persisted to disk and reevaluated later?
    • For similar reasons as before, I would again recommend doing an initial implementation with in-memory state only, and figuring out the persistence situation later once the general design and implementation are more fleshed out.
  • Perhaps you're worried about resolving certain patterns immediately (like & file(...)) is too expensive at Google-scale?
    • I think there might be a way around this where we compute the heads and roots of the user-provided range in an efficient way to start, then bisect within that full range but filter each bisection point against the full pattern at each step. scm-bisect supports this pattern as mentioned previously.
    • The Sapling revset system seems to be structured to support this pattern easily, but I'm not sure if the jj one is (or at least the OSS one).

@martinvonz: Should the bisect() function be allowed to return more than on commit? If you had determined that a fork point was good and a merge point was bad, it would make sense to test all branches between then concurrently (but it would be up to the jj bisect command whether to actually do that).

[practical answer] Ideally, bisect would return multiple equally-good midpoints (see note below). But for implementation purposes, it's fine to just have it return one equally-good midpoint arbitrarily and scm-bisect will take care of "discovering" and suggesting other equally-good midpoints via its speculation mechanism.

  • git-branchless does this today via suggesting a single midpoint and using scm-bisect's speculation mechanism to suggest which commits should be tested in parallel.
    • Then it actually runs the test commands in parallel via its own command-execution infrastructure.
    • Ideally, our bisection command would one day be able to reuse jj run's infrastructure (see FR: Implement JJ run  #1869) to execute user commands in parallel.
  • The main invariant is that midpoint cannot return a commit that's already been searched. Hence, after searching a certain midpoint, if there's another equally good midpoint in the search set, it must be suggested next by the midpoint function.
  • For reference, here's the "standard" midpoint calculation procedure for an arbitrary directed graph.
    • Look at each node and calculate the number of ancestor and non-ancestor nodes in the search subgraph, compute the absolute value of the difference, and return the node(s) with the least value.
    • scm-bisect contains a helper implementation for this when you implement BasicSourceControlGraph.
    • If a node has already been searched, then we simply need to exclude it from consideration or skip it.
    • For a graph with many "linear" segments (or an entirely linear graph) of known size, you can imagine optimizations to avoid calculating the value for more than one node in each segment.

[theoretical answer] I ended up deciding that scm-bisect's Strategy::midpoint needs to be changed to return multiple equally good commits (i.e. become Strategy::midpoints, which is implemented WIP in this commit).

  • However, I only decided that it needed to do that in order to generalize/lift to a higher search space and handle practical "minimization" cases (which, given a commit set $X$, effectively searches $\mathcal{P}(X)$ instead of just $X$).
  • In those cases, there's on the order of $|X|$ "midpoints" between $\varnothing$ and $X$, specifically all $x \in \mathcal{P}(X)$ with $|x| = |X| / 2$ (can't get set builder notation to render on GitHub for some reason).
  • The search space induced by $\mathcal{P}(X)$ is a complete graph under the subset ordering relation. But normal source control graphs are nowhere near complete; I would expect even fairly "merge-y" graphs to not have more than 1-2 midpoints for most bisection steps. So we don't really need the multiple midpoints generalization for practical cases.

[aside] By the way, I don't think there's any reason to expose a literal bisect revset function to the user, but I assume you're primarily using it as a hypothetical device to refer to the abstract "bisection" concept.


@joyously: Does this entail more than materializing different commits as the working copy? [...]
@martinvonz: Right, it's still stored somewhere in .hg/, .git/, but it's not version controlled, at least not in the case of hg. Maybe git uses reflogs or something. I haven't checked. [...]

  • Git stores the current bisection state in special refs, something like refs/bisect/good/<hash>, etc. There might be some other state file beyond the marked commits.
  • For interactive runs, git-branchless only stores bisection state in-memory for now, and relies on subshells for the user to drive the search. I think it would be nice to support importing/exporting state to a file.
  • For non-interactive runs, git-branchless incidentally stores the bisection state in its normal run cache, which is nice because it lets it reuse cached results even across unrelated bisections/searches, and lets you clear the cache for certain commits via its general mechanisms.

@joyously This also made me wonder about viability of commits since there is auto-snapshot in jj. It would seem that not every commit could be tested (might not compile), so you need something besides "good" and "bad". [...]
@martinvonz: That's true regardless. Both git and hg have an option to skip a commit if you can't determine if it's good or bad. [...]

scm-bisect currently has this Status enum, which includes "skipped" and "untested" as statuses.

  • Besides build issues, another reason to skip commits is to satisfy any extra filters/restrictions on the search space, if necessary.
  • I even found theoretical use-cases for adding additional flags "skip ancestors" and "skip descendants" to the status, but haven't found the need to implement them yet. That was mostly for the "minimization" vs the "search" use case, where the search graph becomes exponential in size, and pruning it aggressively is more important.

[related note] git bisect does some strange things to handle skipped commits, where it tries to find a commit that's a non-deterministic distance away from the skipped commit (see https://git-scm.com/docs/git-bisect-lk2009/2.6.7#_skip_algorithm). While I agree with the statement "So it is a fact that commits near an untestable commit have a high probability of being untestable themselves", I'm skeptical of the Git approach and proposed a different approach in arxanas/git-branchless#810, which is probably doable now that I've implemented speculation for scm-bisect, but I haven't validated that this particular issue is fixed yet.

@ilyagr
Copy link
Contributor

ilyagr commented Nov 23, 2024

For naming, I like how git branchless makes bisection a part of1 git test. git test can do bisection, but can also evaluate each commit regularly (it also overlaps with what we'd call jj run, but this is something I am less interested in borrowing). So, I wonder if we could borrow the name and call it jj test.

Footnotes

  1. At least, it's one of the methods available in git test. I'm not sure if there's a more specialized scm-bisect command.

@arxanas
Copy link
Contributor

arxanas commented Nov 23, 2024

For naming, I like how git branchless makes bisection a part of1 git test. git test can do bisection, but can also evaluate each commit regularly (it also overlaps with what we'd call jj run, but this is something I am less interested in borrowing). So, I wonder if we could borrow the name and call it jj test.

I'll be honest, my names have turned out to be pretty awful in practice, so I'd stay away from a name that I invented just by default 🤣. (Examples: git-branchless, scm-record, scm-diff-editor have all been confusing or hard to remember; there was actually a thread somewhere where we confirmed that git test is a bad name empirically for people who want to run a command on commits.)

I originally stole the name from https://github.com/spotify/git-test, but then expanded the functionality.

The idea of unifying many run/check/search/fix workflows under some interface is probably worth exploring at some point, but probably with more care than I exercised.

  • I think I had made an argument somewhere (probably a jj fix) that "fixing" is not thought to be much different than "running" from the user's perspective, for example.
  • By the way, in git-branchless, it also supports linear and reverse-linear searching, not just bisection. (Those are distinct from running a command on each commit in a set without any early termination condition, which would be similar to the planned jj run, and is the default mode when no search strategy is specified.) It has a -b/--bisect flag which is just an alias for something like --search binary. The fact that search might be more general than strictly "bisection" would be worth considering in the above.

I'm not sure if there's a more specialized scm-bisect command.

scm-bisect is only a library and has no commands; possibly another instance of misleading or unhelpful naming?

@ilyagr
Copy link
Contributor

ilyagr commented Nov 23, 2024

where we confirmed that git test is a bad name empirically for people who want to run a command on commit

I'd be interested to know more about the reason. They are not immediately obvious to me, so it's valuable knowledge (to me, at least).

The fact that search might be more general than strictly "bisection" would be worth considering in the above.

Yeah, I tried to mention that. Your sentence also suggests jj search as another possible name, though it might not be immediately clear that it searches through commit graph. Maybe jj find-commit (though this doesn't have the implication of running a command at each commit that jj test would).

Update: It could also be jj run --test, I suppose.

@arxanas
Copy link
Contributor

arxanas commented Nov 23, 2024

I'd be interested to know more about the reason. They are not immediately obvious to me, so it's valuable knowledge (to me, at least).

This was the thread I was thinking of (5 respondents if you include reactions... including you!): https://discord.com/channels/968932220549103686/1273367388787839016/1274830013186969741

@ilyagr
Copy link
Contributor

ilyagr commented Nov 23, 2024

No, that's past me. A distant acquaintance, strange guy with strange opinions. 😝

Well, I guess past me and present me mostly agree on jj run test / jj run --test.

@martinvonz
Copy link
Member

  • Why wouldn't we allow it? [referring to bisection over specified revset]

It actually wasn't any of the reasons you guessed :) I was thinking of the additional complexity and of the risk of users shooting themselves in the foot by incorrectly assuming that something is a problem in cli/ when it's actually or lib/ or something like that. But I do think we should support it. I don't think those concerns are much to worry about.

You should get handling of non-linear, non-contiguous subgraphs "for free" when integrating with scm-bisect via BasicSourceControlGraph.

  • It should scale fine for practically all OSS users.

We want it to scale for non-OSS users too, but perhaps it could be used internally in the default index backend.

Is there a reason not to resolve it immediately when the bisect is initiated?
Perhaps you're worried about resolving certain patterns immediately (like & file(...)) is too expensive at Google-scale?

Yes, that is the reason. & files(...) is actually usually fine. The problem is mostly when you don't restrict the set, because then you might end up bisecting over a hundred million commits.

  • I think there might be a way around this where we compute the heads and roots of the user-provided range in an efficient way to start, then bisect within that full range but filter each bisection point against the full pattern at each step. scm-bisect supports this pattern as mentioned previously.

Let's say you have commits A through Z. You start with a working copy on top of commit Z. Now you bisect over the range ..@. jj bisect starts by creating a new working copy on top of commit M. If we continue bisection over the range ..@ as evaluated at this point, we'll miss the M..Z range. I can see how resolving the roots and heads at the start can help you not bisect the new working-copy commit but it doesn't seem to help with the problem that the new range is too small. Or did I misunderstand?

@arxanas
Copy link
Contributor

arxanas commented Nov 25, 2024

We want it to scale for non-OSS users too, but perhaps it could be used internally in the default index backend.

I think it should work out fine. The provided BasicSourceControlGraph is just a helper implementation for search::Strategy, but e.g. Google can provide its own (better-scaling) implementation of search::Strategy at the expense of more implementation effort.

I can discuss more + potentially redesign scm-bisect if necessary once we get there. And, of course, we can also just not use scm-bisect if it doesn't turn out to be helpful.

The problem is mostly when you don't restrict the set, because then you might end up bisecting over a hundred million commits.

git-branchless mostly addresses this via UI:

  • The default revset is stack() (corresponding to your local work), so bounded to a fairly small set by default.
  • When you start bisection, you get a display of the commits that are being searched, so it's obvious if you're searching many more or many fewer commits than intended.
    • Due to lack of time to implement otherwise, git-branchless shows every individual commit on a separate line in the auto-updating interface. We would probably want a much more condensed display that still includes the number of commits in the range, the number of commits left to search, a time estimate, etc.

With some good UI, hopefully it'll be apparent to the user when they try to bisect 100M commits 🙂

I can see how resolving the roots and heads at the start can help you not bisect the new working-copy commit but it doesn't seem to help with the problem that the new range is too small. Or did I misunderstand?

I was supposing that the problem with computing/resolving the revset up-front might be that evaluating a filter like & file(...) would be too expensive (the immediate parent bullet point), and proposing a potential solution for that, but it doesn't sound like that's actually an issue for you, so my proposal here shouldn't be necessary. It doesn't help or address the case of the value of the revset expression changing as you bisect (the original quoted text).

The problem is mostly when you don't restrict the set, because then you might end up bisecting over a hundred million commits.

Coming back to this point, I don't understand whether you're saying that this is a problem by itself, and whether this is an unaddressed problem. Does this mean that it's too expensive to even represent the 100M commits, i.e. evaluate a revset that's too large?

  • I had assumed that the revset evaluation would be able to address those commits symbolically and cheaply (similarly to Sapling's segments), and it would just be infeasible to actually execute the bisection over those commits (although $\log_2(100 \times 10^6) \approx 27$, so maybe not even that?).
  • Strictly speaking, the current scm-bisect design only requires memory linear in the number of tests actually performed, but the BasicSourceControlGraph helper is less efficient (for now?).

@martinvonz
Copy link
Member

With some good UI, hopefully it'll be apparent to the user when they try to bisect 100M commits 🙂

I don't mean that they shouldn't do that. I mean that we should support it without eagerly evaluating the set of 100M commits.

Coming back to this point, I don't understand whether you're saying that this is a problem by itself, and whether this is an unaddressed problem. Does this mean that it's too expensive to even represent the 100M commits, i.e. evaluate a revset that's too large?

Right, it's fine to represent a range like old_commit..new_commit and to start start evaluating it, but it might take a while to stream all the results. It might also run the process out of memory.

So my concern was with e.g. BasicSourceControlGraph::ancestors() returning what appeared to be a set of commits, but maybe I misunderstood what a Node is.

@arxanas
Copy link
Contributor

arxanas commented Dec 16, 2024

For my own future reference, there was a Discord thread to follow up on this topic. Some highlights:

  • The Google revset backend doesn't currently have a way to represent compact ranges like Sapling's segmented changelog, but I had assumed it did. (This might or might not be an actual implementation issue.)
  • The Sapling DAG library comes with a bisect algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants