Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug fix] Adds retry logic to redis store #1407

Merged
merged 1 commit into from
Oct 22, 2024

Conversation

allada
Copy link
Member

@allada allada commented Oct 11, 2024

Adds retry logic and configs for redis store. This will setup redis to reconnect and retry commands if the connection to redis is lost.


This change is Reviewable

Copy link
Member Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@adam-singer

Reviewable status: 0 of 1 LGTMs obtained, and 0 of 7 files reviewed, and pending CI: Web Platform Deployment / ubuntu-24.04, pre-commit-checks (waiting on @adam-singer)


nativelink-metric-collector/Cargo.toml line 9 at r1 (raw file):

[dependencies]
nativelink-metric = { path = "../nativelink-metric" }
opentelemetry = { version = "0.24.0", features = ["metrics"], default-features = false }

fyi: Fixes a test deps issue.


nativelink-store/src/redis_store.rs line 311 at r1 (raw file):

        results: &mut [Option<u64>],
    ) -> Result<(), Error> {
        // TODO(allada) We could use pipeline here, but it makes retry more

fyi: This is most likely the cause of why cluster mode was not working.


nativelink-store/src/redis_store.rs line 378 at r1 (raw file):

        );

        if is_zero_digest(key.borrow()) {

fyi: I tried to simplify this. We will optimize anything needed later... I turns out fred multiplexes commands on the same connection quite well, so we don't need to be very careful in "packing" our streams here, so there's a good chance we don't need all this old complexity.


nativelink-store/src/redis_store.rs line 418 at r1 (raw file):

                Ok(async move {
                    client
                        .setrange::<(), _, _>(temp_key_ref, offset, chunk)

fyi: Changed to setrange so retries don't interfere with append. Also likely more efficient/safe.


nativelink-store/src/redis_store.rs line 1070 at r1 (raw file):

        };
        let stream = run_ft_aggregate()?
            .or_else(|_| async move {

fyi: Just to simplify the code a bit.

Copy link
Contributor

@SchahinRohani SchahinRohani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 7 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and all files reviewed, and 1 discussions need to be resolved (waiting on @adam-singer)


nativelink-store/src/redis_store.rs line 238 at r2 (raw file):

        let subscriber_client = builder
            .build_subscriber_client()
            .err_tip(|| "while creating redis subscriber client")?;

Shouldn't this PR throw an error if the redis connections fails in general?

For making the first connection with a publish client for example?

This would fix issue #1266

Copy link
Member Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and all files reviewed, and 1 discussions need to be resolved (waiting on @adam-singer)


nativelink-store/src/redis_store.rs line 238 at r2 (raw file):

Previously, SchahinRohani (Schahin) wrote…

Shouldn't this PR throw an error if the redis connections fails in general?

For making the first connection with a publish client for example?

This would fix issue #1266

The redis library we use (fred) manages reconnects for us. This sets it all up. In this case no, we don't want to slow down the startup of nativelink just because we don't have a redis connection... instead we queue up commands then dispatch them when it connects.

Copy link
Contributor

@SchahinRohani SchahinRohani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and all files reviewed, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / macos-13, Bazel Dev / macos-14, Bazel Dev / ubuntu-24.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Coverage, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, NativeLink.com Cloud / Remote Cache / macos-14, NativeLink.com Cloud / Remote Cache / ubuntu-24.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, Web Platform Deployment / macos-14, Web Platform Deployment / ubuntu-24.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (22.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable (waiting on @adam-singer)

Copy link
Contributor

@SchahinRohani SchahinRohani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-@adam-singer

Reviewable status: 1 of 1 LGTMs obtained, and all files reviewed, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / macos-13, Bazel Dev / macos-14, Bazel Dev / ubuntu-24.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Coverage, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, NativeLink.com Cloud / Remote Cache / macos-14, NativeLink.com Cloud / Remote Cache / ubuntu-24.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, Web Platform Deployment / macos-14, Web Platform Deployment / ubuntu-24.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (22.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable

Adds retry logic and configs for redis store. This will setup redis to
reconnect and retry commands if the connection to redis is lost.

closes TraceMachina#1266
Copy link
Contributor

@SchahinRohani SchahinRohani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained, and all files reviewed

@allada allada merged commit a815ba0 into TraceMachina:main Oct 22, 2024
35 checks passed
@allada allada deleted the redis-retry branch October 22, 2024 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants