Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(oom): rdkafka uses too much memory #10888

Closed
fuyufjh opened this issue Jul 11, 2023 · 7 comments
Closed

bug(oom): rdkafka uses too much memory #10888

fuyufjh opened this issue Jul 11, 2023 · 7 comments
Assignees
Labels
type/bug Something isn't working
Milestone

Comments

@fuyufjh
Copy link
Member

fuyufjh commented Jul 11, 2023

Describe the bug

https://buildkite.com/risingwave-test/longevity-kubebench/builds/429#0188fb4e-838b-45f3-85ad-ae0b0e6f9277

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

No response

Additional context

No response

@fuyufjh fuyufjh added the type/bug Something isn't working label Jul 11, 2023
@github-actions github-actions bot added this to the release-0.19 milestone Jul 11, 2023
@fuyufjh fuyufjh self-assigned this Jul 14, 2023
@fuyufjh fuyufjh modified the milestones: release-0.19, release-1.1 Jul 14, 2023
@fuyufjh
Copy link
Member Author

fuyufjh commented Jul 20, 2023

Reproduced with MemTable statistics. Seems that MemTable is not the cause.

image image

Grafana Link

@fuyufjh
Copy link
Member Author

fuyufjh commented Jul 21, 2023

rdkafka takes so much memory? (~6GB)

image

cn.1.15.i15_04-47-01.heap.collapsed.zip

Captured before OOM:

image

@fuyufjh
Copy link
Member Author

fuyufjh commented Jul 24, 2023

After I tweaked some rd_kafka parameters, the memory usage became quite stable:

image

(However, another bug #11170 came out... 🤣)

I am going to make it configurable in production.

@fuyufjh
Copy link
Member Author

fuyufjh commented Jul 26, 2023

Let me explain the problem and solution here.

The problem was discussed in confluentinc/librdkafka#2076. In short, rdkafka keep the buffered messages in fetchq (fetch queue) more than queued.min.messages, while during this, the memory usage explodes.

I tried to decrease the queued.min.messages to 1/5 of default value but that reduced the performance drastically (basically 1/5).

I have a few ideas to continue the tweak, but that would be very time-consuming. Better to do it after we have statistics (#11187). For now, if users encounter this issue, they may use #11203 as a work-around.

@fuyufjh fuyufjh changed the title bug(oom): Failed to run Nexmark Q1-Q20 bug(oom): rdkafka uses too much memory Aug 1, 2023
@tabVersion
Copy link
Contributor

since #11273 is merged, let's see if it helps

@fuyufjh
Copy link
Member Author

fuyufjh commented Aug 2, 2023

since #11273 is merged, let's see if it helps

Reran here: https://buildkite.com/risingwave-test/longevity-kubebench/builds/542

@fuyufjh
Copy link
Member Author

fuyufjh commented Aug 8, 2023

During the last test, it seems that CN can run without OOM, so let me close this for now.

https://buildkite.com/risingwave-test/longevity-kubebench/builds/550

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants