-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy memory allocation in channel pushers. #394
Comments
If this being the default behavior is a deal breaker it can always be gated either behind a feature flag or behind a timely |
Sorry for the delay, I was out / am overwhelmed. :) The thing you describe sounds like an issue we should fix. If you have a patch, we could totally look at it. It would be a performance regression for some use cases (those that fill these buffers in steady state). Do you have a take on whether your graphs 1. alternate between filling the buffers and then idling on empty buffers, or 2. don't fill the buffers and so a smaller default / adaptive approach would be better, or maybe a bit of both? |
Another option, though idk for certain: this could just be a different pact implementation. |
There's also, and I'm not sure I want to be that guy but, the possibility that making hundreds of independent copies of a large dataflow graph is just an anti-pattern in TD. I apologize if you have before, but can you explain why this can't be one copy of the large dataflow graph, with keys on the data to indicate which logical instance they belong to? |
I understand this is true in theory, but I wonder if this actually happens in practice.
I do. I'll clean it up and submit as PR, thanks! I can also try implementing @Kixiron 's suggestion and make this a configuration option (which will break backwards compatibility for existing timely applications and add some extra forks in the code).
It's definitely 1. And I did try with a smaller default, which helps, but a lot of memory still gets wasted.
There are several reasons. The most important one is programmability/security. Our users write hundreds of (ddlog) rules and it would be a pain for them to use the extra key in all rules. And even if we provided some idioms to help them get this right, a small typo would cause data to flow across different logical instances, which in the real world translates to sensitive data leaking across different external customers, which is the absolute worst nightmare for the folks we're working with. In fact, we are planning to implement even stronger isolation by encapsulating each dataflow in a separate process. Aside from this, scalability is another concern. @Kixiron managed to improve the scalability of DDlog-generated code with the number of workers, but it's still nowhere near as good as the near-linear scaling we get when using multiple separate dataflows. |
The current implementation of channel pushers preallocates `Message::default_length` entries per pusher. For very large graphs this adds up to a lot of memory being allocated even when the system is idle with no outstanding messages. This patch changes the allocation policy to only allocate channel memory when there are messages to send and to deallocate it when there are no messages, reducing the memory footprint to 0 in idle state at the cost of some potential slow-down due to a larger number of allocations. See TimelyDataflow#394.
Created a draft PR: #397 |
The current implementation of channel pushers preallocates `Message::default_length` entries per pusher. For very large graphs this adds up to a lot of memory being allocated even when the system is idle with no outstanding messages. This patch changes the allocation policy to only allocate channel memory when there are messages to send and to deallocate it at the end of a burst of messages (signaled by pushing a `None` message), reducing the memory footprint to 0 in idle state at the cost of some potential slow-down due to a larger number of allocations. See TimelyDataflow#394.
The current implementation of channel pushers preallocates `Message::default_length` entries per pusher. For very large graphs this adds up to a lot of memory being allocated even when the system is idle with no outstanding messages. This patch changes the allocation policy to only allocate channel memory when there are messages to send and to deallocate it at the end of a burst of messages (signaled by pushing a `None` message), reducing the memory footprint to 0 in idle state at the cost of some potential slow-down due to a larger number of allocations. See TimelyDataflow#394.
We have an application that builds a very large dataflow graph (thousands of operators) and creates hundreds of concurrently running instances of this graph. Heap profiling showed that the application wastes a lot of memory in channel pushers, which maintain
Message::default_length
entries even when the system is idle with no outstanding messages.I understand that this is a performance optimization that reduces the number of
malloc
calls, but the memory overhead was not acceptable for our use case, so I created a patch that changes the allocation policy to only allocate channel memory (in chunks ofMessage::default_length
entries) when there are messages to send and to deallocate it when there are no messages, reducing the memory footprint to 0 in idle state: ddlog-dev@a94e176@frankmcsherry , any chance you'd accept a cleaned up version of this patch as a PR? I realize that it caters for a somewhat idiosyncratic use case, but it doesn't seem to cause a measurable slow down in my experiments and it does save memory.
The text was updated successfully, but these errors were encountered: