Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

errors: fix driver's logic that bases on error variants returned from query execution #1075

Merged

Conversation

muzarski
Copy link
Contributor

@muzarski muzarski commented Sep 19, 2024

Ref: #519

Motivation

The error refactor that I'm currently working on, surprisingly changed driver's logic in some cases where the decision is made based on the error returned from query execution. There are three places where driver's logic was silently altered:

  • retry_policy -> driver should retry execution in case of IoError
  • use_keyspace -> if query execution returned IoErrors for some (not all) nodes/connections, these errors should be ignored
  • speculative_exeuction -> driver should ignore IoErrors in this context

During error refactor I changed all places where QueryError::IoError was constructed. Now all of these IoErrors are represented as BrokenConnectionError and ConnectionPoolError types (and their variants). These types represent an error which implies that a corresponding connection/node is broken/unreachable.

Changes

  • Adjusted driver's logic in places mentioned above.
  • Removed [Query/NewSession]Error::IoError variants altogether. They are no longer constructed anywhere - they were replaced by ConnectionPoolError and BrokenConnection variants during previous refactors.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • [ ] I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • [ ] I have provided docstrings for the public items that I want to introduce.
  • [ ] I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@muzarski muzarski self-assigned this Sep 19, 2024
@muzarski muzarski added the bug Something isn't working label Sep 19, 2024
@muzarski muzarski added this to the 0.15.0 milestone Sep 19, 2024
@github-actions github-actions bot added the semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes label Sep 19, 2024
Copy link

github-actions bot commented Sep 19, 2024

cargo semver-checks detected some API incompatibilities in this PR.
Checked commit: ce0fdca

See the following report for details:

cargo semver-checks output
./scripts/semver-checks.sh --baseline-rev 44a309343e897da31ddfe226a94af8e1f9561b0b
+ cargo semver-checks -p scylla -p scylla-cql --baseline-rev 44a309343e897da31ddfe226a94af8e1f9561b0b
     Cloning 44a309343e897da31ddfe226a94af8e1f9561b0b
     Parsing scylla v0.14.0 (current)
      Parsed [  21.605s] (current)
     Parsing scylla v0.14.0 (baseline)
      Parsed [  20.108s] (baseline)
    Checking scylla v0.14.0 -> v0.14.0 (no change)
     Checked [   0.103s] 89 checks: 88 pass, 1 fail, 0 warn, 0 skip

--- failure enum_variant_missing: pub enum variant removed or renamed ---

Description:
A publicly-visible enum has at least one variant that is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.35.0/src/lints/enum_variant_missing.ron

Failed in:
  variant NewSessionError::IoError, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-44a309343e897da31ddfe226a94af8e1f9561b0b/edcf569f3961e73847e8e0c9f14adf367bbeb770/scylla/src/transport/errors.rs:212
  variant QueryError::IoError, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-44a309343e897da31ddfe226a94af8e1f9561b0b/edcf569f3961e73847e8e0c9f14adf367bbeb770/scylla/src/transport/errors.rs:57

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  41.871s] scylla
     Parsing scylla-cql v0.3.0 (current)
      Parsed [  10.192s] (current)
     Parsing scylla-cql v0.3.0 (baseline)
      Parsed [  10.193s] (baseline)
    Checking scylla-cql v0.3.0 -> v0.3.0 (no change)
     Checked [   0.099s] 89 checks: 89 pass, 0 skip
     Summary no semver update required
    Finished [  20.540s] scylla-cql
make: *** [Makefile:61: semver-rev] Error 1

QueryError::IoError is replaced by two new variants which provide
more context - them being `BrokenConnection` and `ConnectionPoolError`.
In a later commit, we will remove IoError variant
(since it is not constructed anywhere).

We need to adjust the retry decision - we want to retry execution
on some other node in case one of these two error types appeared.
@Lorak-mmk
Copy link
Collaborator

The problems you mentioned sound like serious bugs. Are they present in 0.14 or only on main?

@wprzytula
Copy link
Collaborator

The problems you mentioned sound like serious bugs. Are they present in 0.14 or only on main?

From what I see, they are directly caused by PRs merged in recent days.

@muzarski
Copy link
Contributor Author

The problems you mentioned sound like serious bugs. Are they present in 0.14 or only on main?

It was merged yesterday: #1067, so only main is affected

@muzarski muzarski force-pushed the fix-use-keyspace-errors-management-logic branch from 472c32b to 3172040 Compare September 19, 2024 10:04
@muzarski
Copy link
Contributor Author

v1.1: Deduplicated the logic shared between ClusterWorker and PoolRefiller

@muzarski muzarski requested a review from wprzytula September 19, 2024 10:07
scylla/src/transport/cluster.rs Outdated Show resolved Hide resolved
Extracted the logic of retrieving use_keyspace result to
utility function.
The QueryError::IoErrors are no longer created. They now correspond
to either QueryError::ConnectionPoolError or QueryError::BrokenConnection.

We need to adjust the use_keyspace logic so it does not fail the operation
if any of these error variants is returned.
The decision whether an error can be ignored in the context
of speculative execution needs to be adjusted as well.
We should ignore broken connection errors (previously IoErrors).
As I mentioned previously, these variants were replaced
by `BrokenConnection` and `ConnectionPoolError` variants.

We can now get rid of the variants that are never constructed.
We also need to remove all usages of these variants
(retry_policy/use_keyspace/speculative_execution).
@muzarski muzarski force-pushed the fix-use-keyspace-errors-management-logic branch from 3172040 to ce0fdca Compare September 19, 2024 10:18
@muzarski muzarski requested a review from wprzytula September 19, 2024 10:18
@wprzytula wprzytula merged commit ab07be6 into scylladb:main Sep 20, 2024
11 checks passed
muzarski added a commit to muzarski/scylla-rust-driver that referenced this pull request Oct 4, 2024
Since last time, during error refactor I introduced a silent
bug to the code (scylladb#1075),
I'd like to prevent that from happening in the future. This is why
we replace a `_` match with explicit error variants in retry policy
modules.
muzarski added a commit to muzarski/scylla-rust-driver that referenced this pull request Oct 4, 2024
Since last time, during error refactor I introduced a silent
bug to the code (scylladb#1075),
I'd like to prevent that from happening in the future. This is why
we replace a `_` match with explicit error variants when deciding
if error received after `USE KEYSPACE` should be ignored.
muzarski added a commit to muzarski/scylla-rust-driver that referenced this pull request Oct 4, 2024
Since last time, during error refactor I introduced a silent
bug to the code (scylladb#1075),
I'd like to prevent that from happening in the future. This is why
we replace a `_` match with explicit error variants when deciding
if error received from speculative execution should be ignored.
muzarski added a commit to muzarski/scylla-rust-driver that referenced this pull request Oct 8, 2024
Since last time, during error refactor I introduced a silent
bug to the code (scylladb#1075),
I'd like to prevent that from happening in the future. This is why
we replace a `_` match with explicit error variants in `reliable_latency_measure` function.

We also enable the `wildcard_enum_match_arm` clippy lint here.
muzarski added a commit to muzarski/scylla-rust-driver that referenced this pull request Oct 8, 2024
Since last time, during error refactor I introduced a silent
bug to the code (scylladb#1075),
I'd like to prevent that from happening in the future. This is why
we replace a `_` match with explicit error variants in retry policy
modules.

We also enable `wildcard_enum_match_arm` clippy lint in this place
for QueryError and DbError matches.
muzarski added a commit to muzarski/scylla-rust-driver that referenced this pull request Oct 8, 2024
Since last time, during error refactor I introduced a silent
bug to the code (scylladb#1075),
I'd like to prevent that from happening in the future. This is why
we replace a `_` match with explicit error variants when deciding
if error received after `USE KEYSPACE` should be ignored.

We also enable the `wildcard_enum_match_arm` clippy lint to disallow
using `_` matches.
muzarski added a commit to muzarski/scylla-rust-driver that referenced this pull request Oct 8, 2024
Since last time, during error refactor I introduced a silent
bug to the code (scylladb#1075),
I'd like to prevent that from happening in the future. This is why
we replace a `_` match with explicit error variants when deciding
if error received from speculative execution should be ignored.

We also enabled the `wildcard_enum_match_arm` clippy lint.
muzarski added a commit to muzarski/scylla-rust-driver that referenced this pull request Oct 14, 2024
Since last time, during error refactor I introduced a silent
bug to the code (scylladb#1075),
I'd like to prevent that from happening in the future. This is why
we replace a `_` match with explicit error variants in `reliable_latency_measure` function.

We also enable the `wildcard_enum_match_arm` clippy lint here.
muzarski added a commit to muzarski/scylla-rust-driver that referenced this pull request Oct 14, 2024
Since last time, during error refactor I introduced a silent
bug to the code (scylladb#1075),
I'd like to prevent that from happening in the future. This is why
we replace a `_` match with explicit error variants in retry policy
modules.

We also enable `wildcard_enum_match_arm` clippy lint in this place
for QueryError and DbError matches.
muzarski added a commit to muzarski/scylla-rust-driver that referenced this pull request Oct 14, 2024
Since last time, during error refactor I introduced a silent
bug to the code (scylladb#1075),
I'd like to prevent that from happening in the future. This is why
we replace a `_` match with explicit error variants when deciding
if error received after `USE KEYSPACE` should be ignored.

We also enable the `wildcard_enum_match_arm` clippy lint to disallow
using `_` matches.
muzarski added a commit to muzarski/scylla-rust-driver that referenced this pull request Oct 14, 2024
Since last time, during error refactor I introduced a silent
bug to the code (scylladb#1075),
I'd like to prevent that from happening in the future. This is why
we replace a `_` match with explicit error variants when deciding
if error received from speculative execution should be ignored.

We also enabled the `wildcard_enum_match_arm` clippy lint.
@wprzytula wprzytula mentioned this pull request Nov 14, 2024
@muzarski muzarski deleted the fix-use-keyspace-errors-management-logic branch December 19, 2024 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants