Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start migrating to Stream-based diff iterator #2501

Merged
merged 3 commits into from
Nov 3, 2023
Merged

Conversation

martinvonz
Copy link
Member

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

@martinvonz martinvonz force-pushed the push-tokvzroztlxv branch 2 times, most recently from 2bd8eb5 to e60382f Compare November 2, 2023 12:52
@martinvonz martinvonz force-pushed the push-tokvzroztlxv branch 2 times, most recently from 947e30f to b998230 Compare November 2, 2023 22:28
During the transition to using more async code, I keep running into
rust-lang/futures-rs#2090. Right now, I want
to convert `MergedTree::diff()` into a `Stream`. I don't want to
update all call sites at once, so instead I'm adding a
`MergedTree::diff_stream()` method, which just wraps
`MergedTree::diff()` in a `Stream. However, since the iterator is
synchronous, it needs to block on the async `Backend::read_tree()`
calls. If we then also block on the `Stream` in the CLI, we run into
the panic.
I'm going to implement a `Stream`-based version optimized for
high-latency (RPC-based) commit backends. So far, that implementation
is about 20% slower in the Linux repo when running `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0`. I think that's almost
only because the algorithm is different, not because it's async per
se.

This commit adds a `Stream`-based version of `MergedTree::diff()` that
just wraps the regular iterator in stream. I updated `jj diff` to use
it. I couldn't measure any difference on the command above in the
Linux repo. I think that means we can safely use the same
`Stream`-based interface regardless of backend, even if we end up
needing two different implementations of the `Stream`. We would then
be using the wrapped iterator from this commit for local backends, and
the new implementation for remote backends. But ideally we can make
the remote-friendly implementation fast enough that we don't need two
implementations.
This will make it a little faster to update the working copy at Google
once we've made `MergedTree::diff_stream()` fetch trees
concurrently. (It only makes it a little faster because we still fetch
files serially.)
Copy link
Contributor

@yuja yuja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's good idea to switching all blocking callers to pollster, but I don't have a reason against that.

@martinvonz
Copy link
Member Author

I don't know if it's good idea to switching all blocking callers to pollster, but I don't have a reason against that.

I think I feel the same way. We'll see if run into a deadlock that futures::executor::block_on() would have prevented.

@martinvonz martinvonz merged commit 904c37d into main Nov 3, 2023
14 checks passed
@martinvonz martinvonz deleted the push-tokvzroztlxv branch November 3, 2023 15:15
@PhilipMetzger
Copy link
Contributor

Maybe they will then look at TreeDiffStream. I added a comment on that type. Let me know if that doesn't address your concern.

With that comment the motivation is clear. LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants