Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exchange: bench, do not preallocate buffers #416

Merged
merged 11 commits into from
Nov 20, 2021

Conversation

antiguru
Copy link
Member

@antiguru antiguru commented Aug 18, 2021

This PR:

  • Changes the exchange behavior to only allocate push buffers when encountering data.
  • Adds a benchmark to compare exchange changes.

I compared the current implementation versus the changes in this PR, and the benchmark doesn't report significant changes in performance. Obviously, this PR encodes a policy change from eagerly allocated buffers to allocating them on-demand, which could have an impact on memory utilization and processing latency.

As explained on the commit message, the cost of allocating buffers exists today, only that it is now during initial construction and with the default process allocator after push operations. We shift those to allocations before the fist buffer push, which means that the absolute number of allocations stays the same. Regarding latency, we need to check before pushing to a buffer whether it has the right capacity, which might add to latency. However, I'd hope that the compiler can fold this into the subsequent test whether the buffer is full, so I wouldn't expect much of a latency impact either.

@antiguru antiguru force-pushed the exchange_bench branch 2 times, most recently from 268bdff to a6c142e Compare August 23, 2021 15:51
@antiguru antiguru marked this pull request as ready for review August 23, 2021 15:54
@antiguru antiguru changed the title Exchange bench Exchange generalization and bench Aug 23, 2021
@antiguru
Copy link
Member Author

Benchmark results temporarily at https://dev.antiguru.de/criterion/report/

@antiguru
Copy link
Member Author

@Kixiron, @ryzhyk, this PR provides an EagerExchange, which should correspond roughly to what you proposed in #397, but not the Tee changes. It'd be great if you could test this change with your benchmark.

@ryzhyk
Copy link
Contributor

ryzhyk commented Sep 7, 2021

@Kixiron, @ryzhyk, this PR provides an EagerExchange, which should correspond roughly to what you proposed in #397, but not the Tee changes. It'd be great if you could test this change with your benchmark.

@antiguru, very sorry, I missed your message, will give it a try and let you know.

@ryzhyk
Copy link
Contributor

ryzhyk commented Sep 9, 2021

@antiguru , I took a close look at the PR. Unfortunately, trying it out is not as easy as just switching to a different timely branch; I'd have to modify our code that constructs the dataflow graph to apply the new pacts to every operator. I apologize, but unfortunately I won't have the cycles to work on this now :(

@antiguru antiguru force-pushed the exchange_bench branch 2 times, most recently from 4d9839b to 9f52d13 Compare September 10, 2021 14:10
@antiguru antiguru changed the title Exchange generalization and bench Exchange: bench, do not preallocate buffers Nov 8, 2021
The LazyExchange PACT avoids allocations ahead of time while maintaining
the performance of the eagerly-allocating Exchange PACT. Extend the
Exchange operator with an `exchange_pact` function taking an explicit
PACT.

Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Instead of defining distinct exchange pushers, make the default generic
and provide necessary hooks for specialized exchange pushers.

Signed-off-by: Moritz Hoffmann <[email protected]>
I did some measurements comparing the
* current default,
* a non-preallocating exchange
* id. plus eagerly releasing exchange.

The benchmark in the chain of commits does not report a significant change
in performance for all variants. Erring on the side of caution, I'd propse
to make the second variant, i.e. not pre-allocating, but retaining if push
returned a buffer, the default exchange pact.

Currently, all non-zerocopy allocators will not return an allocation on
push, so in this case, the second and third variant are equivalent.

# Why is there no change in performance?

For single-recrod exchanges, this can be reasoned as follows. In the old
approach, we'd have an allocated buffer, where we write a single element.
On flush, the buffer would be sent to the pusher, which might or might not
return a buffer. If it doesn't, we'd allocate a new one.

In the new approach, we don't have a buffer to append to, so we'd first
allocate one. On flush, we send the buffer downstream and maybe get one
back. We don't eagerly allocate a new one if there is no buffer available.

Both approaches allocate a new buffer if the pusher doesn't/didn't return
one earlier. Only the order of allocations is changed. While in the old
approach we send data and allocate, we allocate and send data in the new
approach. On top of this, the new approach needs to pay a cost of checking
the buffer's capacity for each record on the input, but I'd hope for the
optimizer to sort this out.

The same argument seems to hold for larger batch sizes.

Signed-off-by: Moritz Hoffmann <[email protected]>
Copy link
Member

@frankmcsherry frankmcsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! If you don't mind, adding comments around the two new allocation blocks would make sure we don't forget why we are doing this, and prevents future me from just copying the push() fragment without the allocation test.

Signed-off-by: Moritz Hoffmann <[email protected]>
@antiguru antiguru merged commit fa63c3a into TimelyDataflow:master Nov 20, 2021
@antiguru antiguru deleted the exchange_bench branch November 20, 2021 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants