-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: add doc on rate limit and backpressure #16450
Conversation
docs/rate-limit-and-backpressure.md
Outdated
The data between barriers takes a long time to process. | ||
This can be due to high amplification, UDF latency, or other reasons. | ||
|
||
By default, Barriers have an interval of 256ms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be 1s.
risingwave/src/config/example.toml
Line 208 in 3a08273
barrier_interval_ms = 1000 |
However, the channel will not immediately backpressure A and B. | ||
It will only do so once it is full. | ||
|
||
This means that the stream chunks between barriers are equivalent to channel capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we process streaming data in batches so there have to be some records buffered in each executor (typically within the chunk builder as the local variables). I suppose they also act as channels in this sense.
|
||
### Backfill | ||
|
||
In a newly created MV, it will undergo Backfilling, to fill the MV with historical data and fresh data concurrently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After creating new jobs we'll naturally have more channels to fill in order to make back-pressure take effects. This might also be a reason why we get temporary increase in barrier latency.
This can be due to high amplification, UDF latency, or other reasons. | ||
|
||
By default, Barriers have an interval of 1s. | ||
If records between those barrier take longer than 256ms to process, the barrier latency will increase by that difference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate more on why 256ms? It seems to have few context here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, didn't get it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for some barrier interval mismatches in examples
This can be due to high amplification, UDF latency, or other reasons. | ||
|
||
By default, Barriers have an interval of 1s. | ||
If records between those barrier take longer than 256ms to process, the barrier latency will increase by that difference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, didn't get it
``` | ||
|
||
The barrier latency for Y will be 0ms, since our barrier interval is 256ms, | ||
and the time between X and Y is less than 256ms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this example assume barrier_interval to be 256ms? Why not the default value 1000ms?
_example.toml_ | ||
```toml | ||
[streaming.developer] | ||
stream_exchange_initial_permits = 2048 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stream_exchange_initial_permits = 2048 | |
# The initial permits that a channel holds, i.e., the maximum row count can be buffered in the channel. | |
stream_exchange_initial_permits = 2048 |
|
||
### Considering Amplification and UDF latency | ||
|
||
When there's high amplification or UDF latency in some downstream actor, upstream will naturally be backpressured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Amplification" is not a widely-used term, although we are all know it. Let's add some introduction of it.
Amplification refers to a common phenomenon in streaming. For example, consider the following Join:
SELECT * FROM users JOIN orders ON orders.user_id = users.id
We know that each user often has multiple orders, and a minority of users may have thousands of orders. If a field in the user table is updated, then reflected in the results of the above JOIN query, there could be thousands of changed rows. Such amplification can have unexpected performance impacts on streaming jobs, and therefore requires special attention.
This PR has been open for 60 days with no activity. |
Close this PR as there's no further actions taken after it is marked as stale for 7 days. Sorry! 🙏 You can reopen it when you have time to continue working on it. |
Any updates? |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.