Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleScheduler version matching uses Aborted to know if failure #1308

Merged

Conversation

allada
Copy link
Member

@allada allada commented Sep 1, 2024

In the event of a failure of version matching for scheduler owning an
operation we use Aborted error code to now signal that the version
failed and can be retried.

This is not a bug, current in-memory scheduler guarantees protections
here, this is for lockless schedulers.

towards #359


This change is Reviewable

Copy link
Member Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@adam-singer

(this PR is part of a chain, so only review r2-r3)

Reviewable status: 0 of 1 LGTMs obtained, and 0 of 8 files reviewed, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, NativeLink.com Cloud / Remote Cache (Legacy Dockerfile Test), Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (22.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, vale, windows-2022 / stable (waiting on @adam-singer)

@allada allada force-pushed the simple-scheduler-abort-version-match branch from 771e98b to 7d78e70 Compare September 1, 2024 20:38
Copy link
Contributor

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r1, 6 of 6 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained, and all files reviewed

In the event of a failure of version matching for scheduler owning an
operation we use Aborted error code to now signal that the version
failed and can be retried.

This is not a bug, current in-memory scheduler guarantees protections
here, this is for lockless schedulers.

towards TraceMachina#359
@allada allada force-pushed the simple-scheduler-abort-version-match branch from 7d78e70 to f5129de Compare September 5, 2024 16:14
Copy link
Member Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained, and all files reviewed

@allada allada merged commit 753c1e7 into TraceMachina:main Sep 5, 2024
28 checks passed
@allada allada deleted the simple-scheduler-abort-version-match branch September 5, 2024 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants