-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug fix] Adds retry logic to redis store #1407
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained, and 0 of 7 files reviewed, and pending CI: Web Platform Deployment / ubuntu-24.04, pre-commit-checks (waiting on @adam-singer)
nativelink-metric-collector/Cargo.toml
line 9 at r1 (raw file):
[dependencies] nativelink-metric = { path = "../nativelink-metric" } opentelemetry = { version = "0.24.0", features = ["metrics"], default-features = false }
fyi: Fixes a test deps issue.
nativelink-store/src/redis_store.rs
line 311 at r1 (raw file):
results: &mut [Option<u64>], ) -> Result<(), Error> { // TODO(allada) We could use pipeline here, but it makes retry more
fyi: This is most likely the cause of why cluster mode was not working.
nativelink-store/src/redis_store.rs
line 378 at r1 (raw file):
); if is_zero_digest(key.borrow()) {
fyi: I tried to simplify this. We will optimize anything needed later... I turns out fred
multiplexes commands on the same connection quite well, so we don't need to be very careful in "packing" our streams here, so there's a good chance we don't need all this old complexity.
nativelink-store/src/redis_store.rs
line 418 at r1 (raw file):
Ok(async move { client .setrange::<(), _, _>(temp_key_ref, offset, chunk)
fyi: Changed to setrange
so retries don't interfere with append
. Also likely more efficient/safe.
nativelink-store/src/redis_store.rs
line 1070 at r1 (raw file):
}; let stream = run_ft_aggregate()? .or_else(|_| async move {
fyi: Just to simplify the code a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 7 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and all files reviewed, and 1 discussions need to be resolved (waiting on @adam-singer)
nativelink-store/src/redis_store.rs
line 238 at r2 (raw file):
let subscriber_client = builder .build_subscriber_client() .err_tip(|| "while creating redis subscriber client")?;
Shouldn't this PR throw an error if the redis connections fails in general?
For making the first connection with a publish client for example?
This would fix issue #1266
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained, and all files reviewed, and 1 discussions need to be resolved (waiting on @adam-singer)
nativelink-store/src/redis_store.rs
line 238 at r2 (raw file):
Previously, SchahinRohani (Schahin) wrote…
Shouldn't this PR throw an error if the redis connections fails in general?
For making the first connection with a publish client for example?
This would fix issue #1266
The redis library we use (fred
) manages reconnects for us. This sets it all up. In this case no, we don't want to slow down the startup of nativelink just because we don't have a redis connection... instead we queue up commands then dispatch them when it connects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and all files reviewed, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / macos-13, Bazel Dev / macos-14, Bazel Dev / ubuntu-24.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Coverage, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, NativeLink.com Cloud / Remote Cache / macos-14, NativeLink.com Cloud / Remote Cache / ubuntu-24.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, Web Platform Deployment / macos-14, Web Platform Deployment / ubuntu-24.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (22.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable (waiting on @adam-singer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 1 LGTMs obtained, and all files reviewed, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / macos-13, Bazel Dev / macos-14, Bazel Dev / ubuntu-24.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Coverage, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, NativeLink.com Cloud / Remote Cache / macos-14, NativeLink.com Cloud / Remote Cache / ubuntu-24.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, Web Platform Deployment / macos-14, Web Platform Deployment / ubuntu-24.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (22.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable
Adds retry logic and configs for redis store. This will setup redis to reconnect and retry commands if the connection to redis is lost. closes TraceMachina#1266
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r4, all commit messages.
Reviewable status: complete! 1 of 1 LGTMs obtained, and all files reviewed
Adds retry logic and configs for redis store. This will setup redis to reconnect and retry commands if the connection to redis is lost.
This change is