Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

errors: refactor error types on a connection level #1067

Merged
merged 33 commits into from
Sep 18, 2024

Conversation

muzarski
Copy link
Contributor

Ref: #519

Motivation

My main motivation for this PR, was the @Lorak-mmk comment, that currently the QueryError contains a CqlResponseParseError. CqlResponseParseError contains variants which represent a parsing errors which SHOULD NOT appear when executing requests via user API e.g. CqlAuthChallengeParseError, which can only happen during connection setup (unless a protocol error appeared, and server sent us an incorrect response, with an incorrect type).

Since QueryError is abused everywhere, a lot of changes needed to be introduced before getting rid of the variants mentioned above. However, there are some positives, since we could address some other issues as well.

BrokenConnectionError

One of the main entry points for a connection is a Connection::router function, which starts 4 main async tasks (reader, writer, keepaliver, orphaner). With current driver's design, if one of them fails, the whole connection is treated as broken, and closed. Before this PR, all of these tasks would return a QueryError, which is a really broad type destined for the users. It contains serialization errors, which cannot appear during connection's async tasks work. This is why I introduced a BrokenConnectionError type, returned if one of these 4 tasks fail.

In this PR, we perform an error transition refactor in two places:

  • PoolRefiller <-> Connection layer
  • Session (user) <-> Connection layer (addresses the issue mentioned in PR's motivation)

PoolRefiller <-> Connection

There are some errors that can appear on a connection level which will not reach user's end. They are handled and logged by PoolRefiller of a corresponding node. This is why, it's a great place to introduce a new error type destined specifically for this case - ConnectionError.

The most important variant of this error is ConnectionSetupRequestError (with the type of the same name). Once again, the CqlResponseParseError contains some variants which are not applicable here - e.g. CqlResultParseError. I added more context for the errors that can appear during connection setup (STARTUP, OPTIONS, AUTH requests).

Session <-> Connection

This is completely analogous to the above transition. We need to filter out some errors types which are not applicable in this case, and add some context to the errors. The most important error type here is UserRequestError which corresponds to the errors that can appear when executing one of PREPARE, QUERY, EXECUTE or BATCH requests.

Transition error types

To achieve all the things listed above, we also needed to introduce some driver's internal transition error types, such as RequestError or ResponseParseError They were created only to narrow the error types of some driver's internal functions.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • [ ] I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@muzarski muzarski self-assigned this Aug 26, 2024
@muzarski muzarski added this to the 0.15.0 milestone Aug 26, 2024
@github-actions github-actions bot added the semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes label Aug 26, 2024
Copy link

github-actions bot commented Aug 26, 2024

cargo semver-checks detected some API incompatibilities in this PR.
Checked commit: 9259dea

See the following report for details:

cargo semver-checks output
./scripts/semver-checks.sh --baseline-rev 9ff1161574d1222ede3965341b8c09ee706a55dd
+ cargo semver-checks -p scylla -p scylla-cql --baseline-rev 9ff1161574d1222ede3965341b8c09ee706a55dd
     Cloning 9ff1161574d1222ede3965341b8c09ee706a55dd
     Parsing scylla v0.14.0 (current)
      Parsed [  21.914s] (current)
     Parsing scylla v0.14.0 (baseline)
      Parsed [  20.530s] (baseline)
    Checking scylla v0.14.0 -> v0.14.0 (no change)
     Checked [   0.090s] 89 checks: 89 pass, 0 skip
     Summary no semver update required
    Finished [  42.585s] scylla
     Parsing scylla-cql v0.3.0 (current)
      Parsed [  10.346s] (current)
     Parsing scylla-cql v0.3.0 (baseline)
      Parsed [  10.264s] (baseline)
    Checking scylla-cql v0.3.0 -> v0.3.0 (no change)
     Checked [   0.100s] 89 checks: 86 pass, 3 fail, 0 warn, 0 skip

--- failure enum_variant_added: enum variant added on exhaustive enum ---

Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.35.0/src/lints/enum_variant_added.ron

Failed in:
  variant QueryError:CqlResultParseError in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/errors.rs:33
  variant QueryError:CqlErrorParseError in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/errors.rs:37
  variant QueryError:ConnectionPoolError in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/errors.rs:45
  variant QueryError:BrokenConnection in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/errors.rs:63
  variant NewSessionError:CqlResultParseError in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/errors.rs:495
  variant NewSessionError:CqlErrorParseError in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/errors.rs:499
  variant NewSessionError:ConnectionPoolError in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/errors.rs:507
  variant NewSessionError:BrokenConnection in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/errors.rs:525

--- failure enum_variant_missing: pub enum variant removed or renamed ---

Description:
A publicly-visible enum has at least one variant that is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.35.0/src/lints/enum_variant_missing.ron

Failed in:
  variant NewSessionError::CqlResponseParseError, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-9ff1161574d1222ede3965341b8c09ee706a55dd/a33fca0be1be249ea2a74ce24dc27b8a32b2ebd5/scylla-cql/src/errors.rs:391
  variant NewSessionError::TranslationError, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-9ff1161574d1222ede3965341b8c09ee706a55dd/a33fca0be1be249ea2a74ce24dc27b8a32b2ebd5/scylla-cql/src/errors.rs:422
  variant QueryError::CqlResponseParseError, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-9ff1161574d1222ede3965341b8c09ee706a55dd/a33fca0be1be249ea2a74ce24dc27b8a32b2ebd5/scylla-cql/src/errors.rs:27
  variant QueryError::TranslationError, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-9ff1161574d1222ede3965341b8c09ee706a55dd/a33fca0be1be249ea2a74ce24dc27b8a32b2ebd5/scylla-cql/src/errors.rs:57

--- failure inherent_method_missing: pub method removed or renamed ---

Description:
A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.35.0/src/lints/inherent_method_missing.ron

Failed in:
  QueryError::is_address_unavailable_for_use, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-9ff1161574d1222ede3965341b8c09ee706a55dd/a33fca0be1be249ea2a74ce24dc27b8a32b2ebd5/scylla-cql/src/errors.rs:532

     Summary semver requires new major version: 3 major and 0 minor checks failed
    Finished [  20.768s] scylla-cql
make: *** [Makefile:61: semver-rev] Error 1

@muzarski muzarski force-pushed the connection_broken_error branch 2 times, most recently from 1012a92 to fc0b4fe Compare August 26, 2024 15:54
@muzarski muzarski mentioned this pull request Aug 27, 2024
6 tasks
@muzarski muzarski force-pushed the connection_broken_error branch from fc0b4fe to c31a520 Compare August 27, 2024 11:59
@muzarski
Copy link
Contributor Author

Rebased on main

Copy link
Collaborator

@Lorak-mmk Lorak-mmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: it looks like you are not using non_exhaustive on new error types?
Did you decide that types introduced in this PR don't need it or is there some other reason?

scylla/src/transport/connection_pool.rs Outdated Show resolved Hide resolved
scylla-cql/src/errors.rs Outdated Show resolved Hide resolved
scylla/src/transport/connection.rs Outdated Show resolved Hide resolved
scylla-cql/src/errors.rs Show resolved Hide resolved
scylla/src/transport/connection.rs Outdated Show resolved Hide resolved
scylla-cql/src/errors.rs Show resolved Hide resolved
scylla-cql/src/errors.rs Show resolved Hide resolved
scylla/src/transport/connection.rs Show resolved Hide resolved
scylla/src/transport/connection.rs Show resolved Hide resolved
Copy link
Collaborator

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR made me feel like being told a breath-taking story of fighting the evil QueryError that has infected our codebase 😄
Amazing commit separation and messages, the review was a very smooth experience.

scylla/src/transport/connection.rs Outdated Show resolved Hide resolved
scylla-cql/src/errors.rs Outdated Show resolved Hide resolved
scylla-cql/src/errors.rs Outdated Show resolved Hide resolved
scylla-cql/src/errors.rs Outdated Show resolved Hide resolved
scylla-cql/src/errors.rs Outdated Show resolved Hide resolved
scylla-cql/src/frame/response/mod.rs Outdated Show resolved Hide resolved
scylla/src/transport/connection.rs Outdated Show resolved Hide resolved
scylla/src/transport/connection.rs Outdated Show resolved Hide resolved
scylla-cql/src/frame/response/mod.rs Outdated Show resolved Hide resolved
scylla-cql/src/errors.rs Show resolved Hide resolved
@muzarski
Copy link
Contributor Author

v1.1:
introduced ConnectionPoolError (last commit).

scylla-cql/src/errors.rs Outdated Show resolved Hide resolved
@muzarski muzarski force-pushed the connection_broken_error branch from 2a491f6 to c63e415 Compare August 29, 2024 14:44
@muzarski
Copy link
Contributor Author

v2: Addressed most of review comments.

@muzarski muzarski force-pushed the connection_broken_error branch from c63e415 to c0e8e05 Compare August 29, 2024 14:49
@muzarski
Copy link
Contributor Author

v2.1: rebased on main

@muzarski muzarski force-pushed the connection_broken_error branch from c0e8e05 to 93234d8 Compare August 29, 2024 15:45
@muzarski
Copy link
Contributor Author

v2.2: replaced BrokenConnectionErrorKind::IoError with ChannelError variant, which provides more insightful error message explaining why channel operation failed.

Previously, all functions related to selecting a connection, would
return a QueryError::IoError. There is no need to return such
broad error type as QueryError, when only std::io::Error can appear.
Receiving a non-event response on a -1 stream should be considered
an error.
@muzarski muzarski force-pushed the connection_broken_error branch from 93234d8 to 8282d0a Compare September 17, 2024 14:32
@muzarski
Copy link
Contributor Author

Rebased on main.

@muzarski muzarski force-pushed the connection_broken_error branch from 8282d0a to 7776ff1 Compare September 17, 2024 15:45
@muzarski
Copy link
Contributor Author

v2.3: introduced CqlRequestKind enum

Add an enum representing possible CQL requests. This enum is useful
for creating strongly typed error types (e.g. ConnectionSetupRequestError
introduced in a following commit).
Introduced new error type designed to be used for the requests
performed during connection setup - e.g. OPTIONS or STARTUP.

In this commit, we also adjust Connection::get_options method.
It will now extract the supported options, and return
appropriate error in case of failure.
Now error filtering happens in Connection::startup method.
Introduced NonErrorStartupResponse to represent one of two possible
non-error responses to STARTUP request.
Since we tidied up the error types for connection setup requests,
so now they return the ConnectionSetupRequestError, we can remove
the QueryError dependency from ConnectionError.
This is an error that appears case of
failure of user request (requests triggered via public API, i.e.
PREPARE, EXECUTE, QUERY, BATCH).

Its only purpose is to filter out impossible error variants from
`CqlResponseParseError` that appear in `RequestError` type. For example,
it should not be possible to receive `CqlSupportedParseError` when parsing
the response to user request. It means that the server sent an invalid
response to the corresponding request.

The From<UserRequestError> implementation will be adjusted in a following
commits. Firstly, we need to narrow the type of corresponding functions
and methods to UserRequestError.
As commit title states, this commit implements a transition
from RequestError to UserRequestError.

The only accepted responses for user requests are RESULT and ERROR.
If we started deserialization of some other response, but failed,
we will treat it as unexpected response error.
It just narrows the error type from QueryError to UserRequestError
Unfortunately, I didn't know how to decouple this commit
into two separate commits. This is due to the trait bounds
on e.g. RowIteratorWorker::work implementation.
Narrowed return error type of functions that convert
QueryResponse into NonErrorQueryResponse and QueryResult.
Ultimately, we want to get rid of From<RequestError> for QueryError
implementation. This is because, RequestError contains a
CqlResponseParseError which is overloaded with variants such
as CqlAuthChallengeParseError which should not be returned to the user
who uses only BATCH, QUERY, EXECUTE and PREPARE requests.

This is why, we make two transitions in error types in this place.
The first transition is RequestError -> UserRequestError (map_err()),
which filters out variants such as CqlAuthChallengeParseError, and leaves
only either CqlErrorParseError or CqlResultParseError.
The second transition (? operator) is UserRequestError -> QueryError
which makes use of From<UserRequestError> for QueryError implementation.
Since we introduced a transition error type for user requests
(UserRequestError), and adjusted the code to it, we can now
revert the temporary From implementation.
Since requests triggered via user API, cannot receive a response
other than ERROR or RESULT, we need to narrow the response parsing
errors accordingly. Before this commit, we would hold a
CqlResponseParseError which contains a parsing errors for responses
that should never be returned by the server for user requests.

After hard work from previous commits, we can finally replace
this overloaded variant with two variants
- CqlResultParseError -> failed to deserialize RESULT response
- CqlErrorParseError -> failed to deserialize ERROR response

In case of failure of parsing other response, we will return a
QueryError::ProtocolError saying that server returned an unexpected
response. I think we should add more context to this error,
but let's leave it for other PR which will be focused on ProtocolError refactor.
In a following commit, we will introduce a ConnectionPoolError,
which will contain a variant with a ConnectionError (the error registered
for last connection).

At the place of creating this variant, we only have access to the
reference of ConnectionError. Thus, to make it owned, cloning is
required.
Introduced a ConnectionPoolError which appears when
we were unable to select a connection from the connection pool.
This conversion is no longer used, as IO errors
are already handled by transition error types.
Appended `RESULT:` prefix to RESULT response types.
@muzarski
Copy link
Contributor Author

I opened a follow-up PR that moves error types to scylla crate and adjusts their pub-visibility: #1074

@Lorak-mmk
Copy link
Collaborator

Looks fine to me now, but there are still open comments from @wprzytula. Please go over them and mark the fixed ones as resolved.

@muzarski
Copy link
Contributor Author

Looks fine to me now, but there are still open comments from @wprzytula. Please go over them and mark the fixed ones as resolved.

Done. The comments were regarding pub-visibility of error types. This issue is addressed in a follow-up PR.

@wprzytula wprzytula merged commit d08018b into scylladb:main Sep 18, 2024
11 checks passed
@muzarski muzarski deleted the connection_broken_error branch October 29, 2024 14:36
@wprzytula wprzytula mentioned this pull request Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants