Discussion: show the memory usage of individual streaming jobs MV or sink #15447

lmatz · 2024-03-05T07:09:29Z

The motivation is two-fold:

There are cases where a particular MV/sink is consuming a lot of resources due to some unknown problem at the moment. No matter it can be perfectly solved or not eventually, users may often want to remove this particular one and bring the cluster back to normal for the moment. Showing how much memory is being used by each MV can help easily locate the problematic MV/sinks. (Don't know if there is an existing to locate the problem? I may miss a few things)
There are cases where too many MVs/sinks exist in the cluster and the workload runs out of memory as a whole, i.e. everything works normally but the cluster is simply under too much stress. However, it is unclear to users how many is too many. Showing how much memory is being used by each MV looks very intuitive. Users will have reasonable expectations of RW and are more likely to be convinced.

st1page · 2024-03-05T07:38:56Z

"risingwave_dev_dashboard -> Streaming Actors -> Executor Cache Memory Usage of Materialized Views" can monitor the memory usage of the executor's cache.
But the memtable's memory usage is hard to monitor More info here #11442 c.c. @fuyufjh

lmatz · 2024-03-05T07:56:57Z

Just checked the code of Executor Cache Memory Usage of Materialized Views or Executor Cache Memory Usage, and want to confirm that:

they are both estimated, and could be inaccurate, e.g. versus Jemalloc (which I suppose is accurate? but also impossible to get the current metric via Jemalloc?) (I think inaccuracy could be a minor problem that can be ignored right now, after all we just want to have a big picture)
still need to do the sum over multiple tables that belong to the same query manually to calculate the total memory usage of that query (but this is awkward to do within Grafana as the query-tables relationship is not there)

st1page · 2024-03-05T08:23:36Z

they are both estimated, and could be inaccurate, e.g. versus Jemalloc (which I suppose is accurate? but also impossible to get the current metric via Jemalloc?) (I think inaccuracy could be a minor problem that can be ignored right now, after all we just want to have a big picture)

jemalloc does not support multi-tenancy(I am not sure if this word is proper...) memory statistics. It can only give all memory usage in the process

still need to do the sum over multiple tables that belong to the same query manually to calculate the total memory usage of that query (but this is awkward to do within Grafana as the query-tables relationship is not there)

IIRC "Executor Cache Memory Usage of Materialized Views" has sum all the tables with the promQL

lmatz · 2024-03-06T03:12:31Z

you are right, I see the group left now, it is effectively a join between MV and its tables.
Let's close this issue.

lmatz added the needs-discussion label Mar 5, 2024

github-actions bot added this to the release-1.8 milestone Mar 5, 2024

lmatz closed this as completed Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: show the memory usage of individual streaming jobs MV or sink #15447

Discussion: show the memory usage of individual streaming jobs MV or sink #15447

lmatz commented Mar 5, 2024

st1page commented Mar 5, 2024

lmatz commented Mar 5, 2024

st1page commented Mar 5, 2024

lmatz commented Mar 6, 2024

Discussion: show the memory usage of individual streaming jobs MV or sink #15447

Discussion: show the memory usage of individual streaming jobs MV or sink #15447

Comments

lmatz commented Mar 5, 2024

st1page commented Mar 5, 2024

lmatz commented Mar 5, 2024

st1page commented Mar 5, 2024

lmatz commented Mar 6, 2024