Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(storage): lock table may cause deadlock and block iter #13979

Merged
merged 5 commits into from
Dec 14, 2023
Merged

Conversation

Little-Wallace
Copy link
Contributor

@Little-Wallace Little-Wallace commented Dec 13, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Fix #13943

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
@github-actions github-actions bot added the type/fix Bug fix label Dec 13, 2023
Copy link

codecov bot commented Dec 13, 2023

Codecov Report

Attention: 189 lines in your changes are missing coverage. Please review.

Comparison is base (1e56cdb) 68.06% compared to head (3860921) 68.04%.
Report is 2 commits behind head on main.

Files Patch % Lines
src/connector/src/sink/deltalake.rs 51.10% 177 Missing ⚠️
src/common/src/array/arrow/arrow_impl.rs 55.00% 9 Missing ⚠️
src/connector/src/sink/mod.rs 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13979      +/-   ##
==========================================
- Coverage   68.06%   68.04%   -0.03%     
==========================================
  Files        1535     1536       +1     
  Lines      265039   265364     +325     
==========================================
+ Hits       180407   180568     +161     
- Misses      84632    84796     +164     
Flag Coverage Δ
rust 68.04% <51.41%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Little-Wallace <[email protected]>
@hzxa21
Copy link
Collaborator

hzxa21 commented Dec 13, 2023

Can you explain how the deadlock happens in the PR description?

@Little-Wallace
Copy link
Contributor Author

Can you explain how the deadlock happens in the PR description?

I do not understand, but the trace stack show that it hung on lock-table.

@MrCroxx MrCroxx self-requested a review December 14, 2023 05:19
@MrCroxx
Copy link
Contributor

MrCroxx commented Dec 14, 2023

Can you explain how the deadlock happens in the PR description?

I do not understand, but the trace stack show that it hung on lock-table.

Is there a await tree dump?

@chenzl25
Copy link
Contributor

Related issue #13932

@chenzl25
Copy link
Contributor

Just have tested a backfill on a table with only 1 parallelism and it succeeds, so it is a deadlock between different parallelism.

@Little-Wallace Little-Wallace added this pull request to the merge queue Dec 14, 2023
@chenzl25 chenzl25 removed this pull request from the merge queue due to a manual request Dec 14, 2023
Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should hold the LockGuard before calling the notified, otherwise, no one would drop the guard when the future is dropped in that await point.

        let lock = LockGuard {
            key,
            locks: self.locks.clone(),
        };
        if let Some(notify) = notify {
            notify.notified().await;
        }
        return lock

@st1page
Copy link
Contributor

st1page commented Dec 14, 2023

Can you explain how the deadlock happens in the PR description?

I do not understand, but the trace stack show that it hung on lock-table.

Is there a await tree dump?

https://risingwave-labs.slack.com/archives/C069B78AXDW/p1702439327854279

@chenzl25 chenzl25 added this pull request to the merge queue Dec 14, 2023
Merged via the queue into main with commit d8dd8f4 Dec 14, 2023
26 of 27 checks passed
@chenzl25 chenzl25 deleted the fix-deadlock branch December 14, 2023 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

backfill test, it failed to create watermark mv for 1h timeout on nightly-20231211.
6 participants