Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(storage): read block seems to be stuck #15239

Closed
zwang28 opened this issue Feb 26, 2024 · 4 comments
Closed

bug(storage): read block seems to be stuck #15239

zwang28 opened this issue Feb 26, 2024 · 4 comments
Labels
type/bug Something isn't working
Milestone

Comments

@zwang28
Copy link
Contributor

zwang28 commented Feb 26, 2024

Describe the bug

In a cluster using Google Cloud Storage, according to await tree,

  • fetch_block sometimes gets stuck for 5hours before manually terminating it.
  • And it's not uncommon fetch_block takes over 10 seconds to finish.

Error message/log

Will keep the await tree when encountered next time.

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

v1.6.1

Additional context

  • Heavy workload. The barrier typically takes over one hour to finish, if the stuck issue doesn't occurs.
  • Over 10k actors per compute node.
  • 40 CPU and 200GB memory per. compute node.
@zwang28 zwang28 added the type/bug Something isn't working label Feb 26, 2024
@github-actions github-actions bot added this to the release-1.7 milestone Feb 26, 2024
@hzxa21
Copy link
Collaborator

hzxa21 commented Feb 26, 2024

What is value of object_store_read_timeout? By default it is 8mins. Like #15209, It seems that the timeout didn't trigger as well.

@zwang28
Copy link
Contributor Author

zwang28 commented Feb 26, 2024

object_store_read_timeout

In v1.6.1, object_store_read_timeout is 1hour. But the specific fetch_block hangs for several hours.

@Xuanwo
Copy link
Contributor

Xuanwo commented Feb 26, 2024

OpenDAL has native await-tree support, please enable it for more information.

@hzxa21
Copy link
Collaborator

hzxa21 commented Feb 26, 2024

Potential root cause found:
#15209 (comment)

@zwang28 zwang28 closed this as completed Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants