Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(error): report the error's source chain through psql #13264

Merged
merged 15 commits into from
Nov 9, 2023

Conversation

BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Nov 6, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Follow-up of #13248. Visit the error's source chain and get it printed through psql interface. This is achieved by thiserror_ext::Report.

Note that we now interpolate the inner error message into the outer one, so the most of time, printing the out-most error should include all information. However, this is not a good practice that the community will follow (examples: awslabs/aws-sdk-rust#657, EmbarkStudios/rust-ecosystem#20). As a result, we lose the opportunity to visit the source of errors from 3rd-party libraries (like tonic::transport::Error), which leads to issues like #13163.

As discussed in #11443 (comment), snafu provides a clever way to handle these two conventions together by trimming out the duplicated suffixes in the error message. thiserror-ext adapts it to better suit our needs.

Preview

Unsure about which format is preferred, I introduce a env variable RW_PRETTY_ERROR to switch between these two formats:

The style will always be Pretty.

  • Compact NOT AVAILABLE:
    image

    The experience will be majorly the same, except that we're now correctly visiting the source chain to avoid losing any context.

  • Pretty:
    image

    The sources will be printed line by line. The * symbol indicates that the suffix of the message (i.e., the message of the inner error) has been cleaned up. REMOVED

Tests

There's a new series of tests under e2e_test/error_ui/main.slt to track the error reporting of some selected SQL statements.

statement error
create function int_42() returns int as int_42 using link 'localhost:8815';
----
db error: ERROR: QueryError
Caused by these errors (recent errors listed first):
1: failed to connect to UDF service
2: transport error
3: error trying to connect
4: invalid URL, scheme is missing

See https://github.com/risingwavelabs/risingwave/blob/d60b7e13a69b2ba56cc8411e6317662ed9e39f03/e2e_test/error_ui/README.md for details.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Base automatically changed from bz/maintain-error-source to main November 7, 2023 04:25
This reverts commit 932036b.
Signed-off-by: Bugen Zhao <[email protected]>
@BugenZhao BugenZhao force-pushed the bz/display-error-source branch from 8b89b22 to a474acb Compare November 7, 2023 04:35
Signed-off-by: Bugen Zhao <[email protected]>
@@ -44,8 +44,9 @@ pub enum PsqlError {
#[error(transparent)]
Internal(BoxedError),

#[error("Panicked when processing: {0}.\n
This is a bug. We would appreciate a bug report at https://github.com/risingwavelabs/risingwave/issues/new?labels=type%2Fbug&template=bug_report.yml.")]
#[error("Panicked when processing: {0}
Copy link
Member Author

@BugenZhao BugenZhao Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that mutli-line string already contains a \n at the line break. 🤡

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BugenZhao BugenZhao marked this pull request as ready for review November 7, 2023 05:35
@BugenZhao BugenZhao requested a review from a team as a code owner November 7, 2023 05:35
@BugenZhao BugenZhao requested review from xxchan, stdrc and hzxa21 November 7, 2023 05:36
Copy link

codecov bot commented Nov 7, 2023

Codecov Report

Merging #13264 (41c88a5) into main (901d9b0) will increase coverage by 0.07%.
Report is 12 commits behind head on main.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main   #13264      +/-   ##
==========================================
+ Coverage   67.77%   67.85%   +0.07%     
==========================================
  Files        1525     1525              
  Lines      259116   259118       +2     
==========================================
+ Hits       175605   175813     +208     
+ Misses      83511    83305     -206     
Flag Coverage Δ
rust 67.85% <0.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/utils/pgwire/src/error.rs 0.00% <ø> (ø)
src/utils/pgwire/src/pg_message.rs 64.50% <0.00%> (-0.34%) ⬇️

... and 23 files with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

Copy link
Member

@stdrc stdrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM! Just one concern: the * representing omitted suffix seems unnecessary, since we don't lose any information actually, the omitted information is just printed on the next line.

Comment on lines 638 to 642
let msg = if env_var_is_true("RW_PRETTY_ERROR") {
format!("{:#}", error.as_ref().as_report())
} else {
format!("{}", error.as_ref().as_report())
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we ever want the Compact behavior? I prefer not to offer too many choices unless there's a good reason.

Copy link
Member

@xxchan xxchan Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, can we add a test (maybe expect test) for an error message to demonstrate the error format?

Copy link
Member Author

@BugenZhao BugenZhao Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we ever want the Compact behavior?

This could be a temporary workaround. 😕 SQL logic tests and planner tests are heavily relying on the error message, and I don't want to mess them into this PR.

Also, I'm not sure if the developers writing SQL logic tests are intended to match the full messages (instead of the interesting part only), like below. If so, then Pretty will be less friendly.

statement error internal error: invalid digit found in string

statement error QueryError: Bind error: update modifying the PK column is unsupported

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, can we add a test (maybe expect test) for an error message to demonstrate the error format?

There's one in thiserror-ext. Do you want another one for RisingWave's real cases? Then I'll write an integration test connecting to the pgwire server.

Copy link
Member

@xxchan xxchan Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a temporary workaround. 😕 SQL logic tests and planner tests are heavily relying on the error message, and I don't want to mess them into this PR.

If DX is the only concern or trouble, I think we should try to fix it now or later. At least pretty should be the default behavior. (Otherwise what's the meaning of these work? lol)

I think it's ok to update tests in this PR, as the core changes are very small.

If you don't want to do it now, maybe we should introduce RW_COMPACT_ERROR, instead of RW_PRETTY_ERROR.

Do you want another one for RisingWave's real cases?

Yes, that's preferred. So that we can prevent either risingwave or thiserror-ext's unexpected changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update error messages in slt and pretty is always enabled now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want another one for RisingWave's real cases?

Yes, that's preferred. So that we can prevent either risingwave or thiserror-ext's unexpected changes.

I find the error message is not good enough... Maybe let's do this in future PRs. 🤡

Copy link
Member

@xxchan xxchan Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK if the error message is not good. Just have a expect-test to let us know what's the current status is enough. 🤣 And we can see more easily how it's improved in future PRs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find writing tests with madsim is the easiest way, but the udf example does not work there. 🤡

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤡

@xxchan
Copy link
Member

xxchan commented Nov 7, 2023

Similar concern, I think the * symbol is somehow confusing to users.

@BugenZhao
Copy link
Member Author

I copied * from snafu. Then let's remove it. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄🤣

@BugenZhao BugenZhao added this pull request to the merge queue Nov 9, 2023
Merged via the queue into main with commit 7b9896c Nov 9, 2023
9 of 10 checks passed
@BugenZhao BugenZhao deleted the bz/display-error-source branch November 9, 2023 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants