Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: show the memory usage of individual streaming jobs MV or sink #15447

Closed
lmatz opened this issue Mar 5, 2024 · 4 comments
Closed

Comments

@lmatz
Copy link
Contributor

lmatz commented Mar 5, 2024

The motivation is two-fold:

  1. There are cases where a particular MV/sink is consuming a lot of resources due to some unknown problem at the moment. No matter it can be perfectly solved or not eventually, users may often want to remove this particular one and bring the cluster back to normal for the moment. Showing how much memory is being used by each MV can help easily locate the problematic MV/sinks. (Don't know if there is an existing to locate the problem? I may miss a few things)

  2. There are cases where too many MVs/sinks exist in the cluster and the workload runs out of memory as a whole, i.e. everything works normally but the cluster is simply under too much stress. However, it is unclear to users how many is too many. Showing how much memory is being used by each MV looks very intuitive. Users will have reasonable expectations of RW and are more likely to be convinced.

@github-actions github-actions bot added this to the release-1.8 milestone Mar 5, 2024
@st1page
Copy link
Contributor

st1page commented Mar 5, 2024

"risingwave_dev_dashboard -> Streaming Actors -> Executor Cache Memory Usage of Materialized Views" can monitor the memory usage of the executor's cache.
But the memtable's memory usage is hard to monitor More info here #11442 c.c. @fuyufjh

@lmatz
Copy link
Contributor Author

lmatz commented Mar 5, 2024

Just checked the code of Executor Cache Memory Usage of Materialized Views or Executor Cache Memory Usage, and want to confirm that:

  1. they are both estimated, and could be inaccurate, e.g. versus Jemalloc (which I suppose is accurate? but also impossible to get the current metric via Jemalloc?) (I think inaccuracy could be a minor problem that can be ignored right now, after all we just want to have a big picture)
  2. still need to do the sum over multiple tables that belong to the same query manually to calculate the total memory usage of that query (but this is awkward to do within Grafana as the query-tables relationship is not there)

@st1page
Copy link
Contributor

st1page commented Mar 5, 2024

they are both estimated, and could be inaccurate, e.g. versus Jemalloc (which I suppose is accurate? but also impossible to get the current metric via Jemalloc?) (I think inaccuracy could be a minor problem that can be ignored right now, after all we just want to have a big picture)

jemalloc does not support multi-tenancy(I am not sure if this word is proper...) memory statistics. It can only give all memory usage in the process

still need to do the sum over multiple tables that belong to the same query manually to calculate the total memory usage of that query (but this is awkward to do within Grafana as the query-tables relationship is not there)

IIRC "Executor Cache Memory Usage of Materialized Views" has sum all the tables with the promQL

@lmatz
Copy link
Contributor Author

lmatz commented Mar 6, 2024

you are right, I see the group left now, it is effectively a join between MV and its tables.
Let's close this issue.

@lmatz lmatz closed this as completed Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants