-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compaction task hangs without error #15209
Comments
Search for the log covering the full run of the test. I didn't see |
Another issue where compute node hangs reading block. #15239 |
I can't read the logs posted in comments, so I don't know what happened yet. Here are ideas I have:
risingwave/src/object_store/src/object/s3.rs Lines 552 to 567 in e19aaec
|
It confuses me that the timeout does not work when IO hang @Xuanwo
See details in https://github.com/risingwavelabs/risingwave/blob/main/src/object_store/src/object/mod.rs#L465 |
The only opendal log we saw in the testing are related to retry: However, I don't think the retry logs are relevant to this issue because when retry indicates progress on the I/O request but what we saw in this issue is that some I/O requests are stuck forever. As mentioned by @Little-Wallace, we do enable timeout on top of each I/O call by wrapping |
Encountered Blocking of read and upload at the same time |
Root cause found: |
can close this issue ? |
Please ping me when this happen again. |
Recently in our testing pipeline, we found that some compaction tasks hang for a long time (>45min) without any error. Although the meta-side task expiration mechanism can make sure task will be forced cancelled if no progress has been made for some time, it is worth investigating why the task can hang at the first place.
Note that the recent occurrences are all in the test run when testing switching to opendal s3:
The text was updated successfully, but these errors were encountered: