Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute Node OOM in longevity test with 20000 throughput #13326

Closed
wcy-fdu opened this issue Nov 9, 2023 · 3 comments
Closed

Compute Node OOM in longevity test with 20000 throughput #13326

wcy-fdu opened this issue Nov 9, 2023 · 3 comments
Labels
type/bug Something isn't working
Milestone

Comments

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Nov 9, 2023

Describe the bug

CN oom in longevity test with 2w throughput. We originally thought it was caused by the mem table of the materialized operator being too large, but now it turns out that it is caused by the memory consumed by rdkfaka being too high.

Currently the memory of materialized executor is no longer very high. We need to further solve the problem of rdkafka.
heapfile
2w-oom-cn1-0918.collapsed.zip

Related to #11977 #13060

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

No response

Additional context

No response

@wcy-fdu wcy-fdu added the type/bug Something isn't working label Nov 9, 2023
@github-actions github-actions bot added this to the release-1.5 milestone Nov 9, 2023
@fuyufjh
Copy link
Member

fuyufjh commented Nov 10, 2023

Previously, I tried to fix this by adjusting the parameters of fetch queue of rdkafka, but didn't work.

https://github.com/risingwavelabs/kube-bench/compare/main...eric/limit_rdkakfa_fetch_queue_size

If anyone is interested, may continue to work on this. To run with this kube-bench branch, set the env var of BuildKite job: KUBEBENCH_BRANCH="eric/limit_rdkakfa_fetch_queue_size" (Example)

also related #10888

@wcy-fdu
Copy link
Contributor Author

wcy-fdu commented Nov 10, 2023

Update: the occasional oom under 1w throughput is also caused by this(rdfaka).
1w-oom-1109-1034.heap.collapsed.zip

@wcy-fdu
Copy link
Contributor Author

wcy-fdu commented Nov 23, 2023

Can we close this issue? cc @fuyufjh

@wcy-fdu wcy-fdu closed this as completed Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants