Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: drop built-in pager_handle when appropriate #4187

Closed

Conversation

ErichDonGubler
Copy link

The :builtin pager currently has some undesirable behavior when a user is attempting to quit minus paging via keybindings (i.e., q by default). This happens whenever jj has not yet sent all output to the pager thread; it finishes generating all output before heeding the user's request to stop the command. For commands like jj log --revision "..trunk()" in enormous repositories like mozilla-central (where Firefox is hosted), this can take a nontrivial amount of time, and is almost certainly not what the user wants.

Notably, this is a major difference between the built-in and the external pager implementations. Exiting before jj sends all output to less exits more-or-less immediately, with jj also exiting more-or-less immediately with BROKEN_PIPE_EXIT_CODE. This makes sense, as child process input is indeed a broken pipe. This seems like the behavior we should use for the built-in pager, too.

Solution: interrupt jj's loop that sends output to the built-in pager by placing the existing BuiltinPager::pager_handle behind Arc<RwLock<Option<…>>> layers, and:

  1. Discarding pager_handle in contexts where we know we should stop sending output. To wit, these are:

    1. In an exit callback registered with the minus pager.
    2. When the minus paging thread ends. This is technically redundant with (1), but doesn't seem to hurt anything.
    3. When BuiltinPager::finalize is called.
  2. Forcing <BuiltinPager as Write::write to gain a read lock on pager_handle's contents, otherwise returning a broken pipe error.

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

The `:builtin` pager currently has some undesirable behavior when a user
is attempting to quit `minus` paging via keybindings (i.e., `q` by
		default). This happens whenever `jj` has not yet sent all output to the
pager thread; it finishes generating all output before heeding the
user's request to stop the command. For commands like `jj log --revision
"..trunk()"` in enormous repositories like [`mozilla-central`] (where
		Firefox is hosted), this can take a nontrivial amount of time, and is
almost certainly not what the user wants.

[`mozilla-central`]: https://hg.mozilla.org/mozilla-central

Notably, this is a major difference between the built-in and the
external pager implementations. Exiting before `jj` sends all output to
`less` exits more-or-less immediately, with `jj` exiting with
`BROKEN_PIPE_EXIT_CODE`. This makes sense, as child process input is
indeed a broken pipe. This seems like the behavior we should use for the
built-in pager, too.

Solution: interrupt `jj`'s loop that sends output to the built-in pager
by placing the existing `BuiltinPager::pager_handle` behind
`Arc<RwLock<Option<…>>>` layers, and:

1. Discarding `pager_handle` in contexts where we know we should stop
sending output. To wit, these are:

1. In an exit callback registered with the `minus` pager.
2. When the `minus` paging thread ends. This is technically redundant
with (1), but doesn't seem to hurt anything.
3. When `BuiltinPager::finalize` is called.

2. Forcing `<BuiltinPager as Write::write` to gain a read lock on
`pager_handle`'s contents, otherwise returning a broken pipe error.
Copy link

google-cla bot commented Jul 31, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@ErichDonGubler
Copy link
Author

I'm happy to sign a CLA and check the remaining boxes in the PR template, just want to validate that this fix is interesting from maintainers before I go to the effort of doing so.

@ilyagr
Copy link
Contributor

ilyagr commented Jul 31, 2024

This sounds useful to me. I'm not an expert on multi-threading, but I'll take a look at the code.

I'm also hoping that your approach will solve #4182 (comment). If dynamic_pager fails, it might allow jj to not panic but instead notice the error relatively early, right? (Well, actually it's still a bit fishy as far as I can tell. In any case, let me know if you have any comments about that)

Signing the CLA would help, especially for the Googler reviewers. Checking the boxes is only needed if applicable; I'd add a Changelog entry. If you can write a test, that'd be cool and I'd probably learn something from it, but I think it'd be non-trivial and isn't a requirement for this.

@ErichDonGubler
Copy link
Author

CLA has been signed.

@ErichDonGubler
Copy link
Author

WRT testing: If one could mock up a test where output is being sent indefinitely to the built-in pager, one could test that a BrokenPipe IO error is received and that BuiltinPager::finalize terminates after q user input is sent to the pager. Not sure how to do that right this second, but I would be happy to try (tho maybe as follow-up)

Copy link
Contributor

@ilyagr ilyagr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. Thank you!

I'd feel mildly more comfortable if somebody took a second look, since I'm mildly afraid of multi-threading, but if everybody else is busy I'll approve this in a day or two.

@@ -45,7 +46,7 @@ enum UiOutput {

/// A builtin pager
pub struct BuiltinPager {
pager: MinusPager,
pager: Arc<RwLock<Option<MinusPager>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a comment:

Option<MinusPager> is set to None after the pager quits (e.g. the user quits the pager early).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! This is one of the things I was hoping to do before taking this out of draft. 🙂 Will attend to this soon.

.add_exit_callback(Box::new({
let pager_handle = pager_handle.clone();
move || {
pager_handle.write().unwrap().take();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to disable late write()s, we don't have to drop pager object. Instead, maybe we can use AtomicBool to be set on pager quit.

BTW, this PR revealed that the current builtin pager is fundamentally broken. Because the backing channel is unbounded, .write() will never stuck and burn CPU time.

Copy link
Contributor

@ilyagr ilyagr Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the backing channel is unbounded, .write() will never stuck and burn CPU time.

(Just curious) What do you mean? Are you talking about the problem this PR solves or about a separate problem? Are you talking about something we do or minus's design?

Copy link
Contributor

@ilyagr ilyagr Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, AtomicBool sounds fine to me as far as this PR goes, but I'll mention that I was considering replacing the Option here with a Result later for #4182 (comment). I am not 100% sure that's the best solution to that problem, but if we went with that approach, it would need preserving more information than just one AtomicBool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kinda the same problem under the hood, but user might not notice. Basically, if the pager is minus, jj log -r.. -p will scan all ancestors, materialize diffs, and the whole output will be buffered in memory.

Our expectation is that .write() into the pager will stuck if the pipe is full, but minus doesn't because the underlying channel is unbounded.

Copy link
Contributor

@ilyagr ilyagr Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be symptoms of this? Running out of memory? Something more serious?

Would anything short of redesigning minus help? Though, they might accept a PR to block on writes at some limit and return BrokenPipe after quitting.

I think the pager Sapling used, streampager, was more carefully designed for huge inputs, but it seemed abandoned last I checked.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuja: It's true that we could use AtomicBool for signaling here, but I see no advantage to doing so, and a few advantages in keeping the current Option container:

  1. If we don't intend to use the pager anymore, this makes it inaccessible by design. It matches our intent; I think it will be far less surprising for those unfamiliar with this code to see this handle being discarded than an AtomicBool as a peer.
  2. Because minus' pager handle is implemented as a sender and receiver channel pair for duplix comms. with the paging thread, dropping pager_handle's contents should free the CLI thread's Receiver's resource (which is very relevant to the above discussion about resource usage with lines that haven't been displayed yet). It's unclear to me what comms. might be queued up in return from the paging thread, but this sidesteps any question of whether they're significant.

Copy link
Contributor

@yuja yuja Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I would say AtomicBool is simpler and that's the advantage.

As far as I understand the implementation, minus::dynamic_paging() doesn't drop their tx end, so dropping our pager instance wouldn't signal the termination. Ideally, all rx ends should be dropped on pager quit, so any further pager.push_str() automatically fails. However, minus API appears not designed with such use case in mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the point that AtomicBool avoids the possibility of deadlocks? While it's more primitive, that makes its docs more complicated (https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html, for example), so I couldn't immediately figure that out.

Like Erich, I liked the idea of making invalid states inexpressible, but I'm not sure about the downsides of RwLock and whether they are worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the point that AtomicBool avoids the possibility of deadlocks?

No? I just mean boolean flag is easier to follow than heavy lifting on interior mutability.

BTW, judging from the discussion on Discord, it might be better to just switch to streampager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anyone is reading this, here is what seems like a nice book that discusses Atomic types:

https://marabos.nl/atomics/atomics.html#atomic-load-and-store-operations

@yuja
Copy link
Contributor

yuja commented Aug 4, 2024

Sent simpler version as #4209

@ErichDonGubler
Copy link
Author

Dropping in favor of #4209.

@ErichDonGubler ErichDonGubler deleted the push-kzxkyowkkszm branch November 7, 2024 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants