Exchange: bench, do not preallocate buffers #416

antiguru · 2021-08-18T09:40:49Z

This PR:

Changes the exchange behavior to only allocate push buffers when encountering data.
Adds a benchmark to compare exchange changes.

I compared the current implementation versus the changes in this PR, and the benchmark doesn't report significant changes in performance. Obviously, this PR encodes a policy change from eagerly allocated buffers to allocating them on-demand, which could have an impact on memory utilization and processing latency.

As explained on the commit message, the cost of allocating buffers exists today, only that it is now during initial construction and with the default process allocator after push operations. We shift those to allocations before the fist buffer push, which means that the absolute number of allocations stays the same. Regarding latency, we need to check before pushing to a buffer whether it has the right capacity, which might add to latency. However, I'd hope that the compiler can fold this into the subsequent test whether the buffer is full, so I wouldn't expect much of a latency impact either.

antiguru · 2021-08-23T16:08:31Z

Benchmark results temporarily at https://dev.antiguru.de/criterion/report/

antiguru · 2021-08-23T16:12:30Z

@Kixiron, @ryzhyk, this PR provides an EagerExchange, which should correspond roughly to what you proposed in #397, but not the Tee changes. It'd be great if you could test this change with your benchmark.

ryzhyk · 2021-09-07T17:30:19Z

@Kixiron, @ryzhyk, this PR provides an EagerExchange, which should correspond roughly to what you proposed in #397, but not the Tee changes. It'd be great if you could test this change with your benchmark.

@antiguru, very sorry, I missed your message, will give it a try and let you know.

ryzhyk · 2021-09-09T17:30:16Z

@antiguru , I took a close look at the PR. Unfortunately, trying it out is not as easy as just switching to a different timely branch; I'd have to modify our code that constructs the dataflow graph to apply the new pacts to every operator. I apologize, but unfortunately I won't have the cycles to work on this now :(

The LazyExchange PACT avoids allocations ahead of time while maintaining the performance of the eagerly-allocating Exchange PACT. Extend the Exchange operator with an `exchange_pact` function taking an explicit PACT. Signed-off-by: Moritz Hoffmann <[email protected]>

Signed-off-by: Moritz Hoffmann <[email protected]>

Instead of defining distinct exchange pushers, make the default generic and provide necessary hooks for specialized exchange pushers. Signed-off-by: Moritz Hoffmann <[email protected]>

Signed-off-by: Moritz Hoffmann <[email protected]>

I did some measurements comparing the * current default, * a non-preallocating exchange * id. plus eagerly releasing exchange. The benchmark in the chain of commits does not report a significant change in performance for all variants. Erring on the side of caution, I'd propse to make the second variant, i.e. not pre-allocating, but retaining if push returned a buffer, the default exchange pact. Currently, all non-zerocopy allocators will not return an allocation on push, so in this case, the second and third variant are equivalent. # Why is there no change in performance? For single-recrod exchanges, this can be reasoned as follows. In the old approach, we'd have an allocated buffer, where we write a single element. On flush, the buffer would be sent to the pusher, which might or might not return a buffer. If it doesn't, we'd allocate a new one. In the new approach, we don't have a buffer to append to, so we'd first allocate one. On flush, we send the buffer downstream and maybe get one back. We don't eagerly allocate a new one if there is no buffer available. Both approaches allocate a new buffer if the pusher doesn't/didn't return one earlier. Only the order of allocations is changed. While in the old approach we send data and allocate, we allocate and send data in the new approach. On top of this, the new approach needs to pay a cost of checking the buffer's capacity for each record on the input, but I'd hope for the optimizer to sort this out. The same argument seems to hold for larger batch sizes. Signed-off-by: Moritz Hoffmann <[email protected]>

frankmcsherry

Looks good to me! If you don't mind, adding comments around the two new allocation blocks would make sure we don't forget why we are doing this, and prevents future me from just copying the push() fragment without the allocation test.

Signed-off-by: Moritz Hoffmann <[email protected]>

antiguru force-pushed the exchange_bench branch 2 times, most recently from 268bdff to a6c142e Compare August 23, 2021 15:51

antiguru marked this pull request as ready for review August 23, 2021 15:54

antiguru mentioned this pull request Aug 23, 2021

Add the LazyExchange type #406

Closed

antiguru changed the title ~~Exchange bench~~ Exchange generalization and bench Aug 23, 2021

antiguru force-pushed the exchange_bench branch 2 times, most recently from 4d9839b to 9f52d13 Compare September 10, 2021 14:10

antiguru force-pushed the exchange_bench branch from 9f52d13 to 5823de2 Compare October 7, 2021 13:24

antiguru force-pushed the exchange_bench branch from 5823de2 to 3477d4f Compare October 29, 2021 09:17

antiguru changed the title ~~Exchange generalization and bench~~ Exchange: bench, do not preallocate buffers Nov 8, 2021

antiguru force-pushed the exchange_bench branch from e04a831 to 59b11f5 Compare November 9, 2021 10:50

antiguru added 9 commits November 19, 2021 11:56

experiments: Add exchange

b9e5d16

Signed-off-by: Moritz Hoffmann <[email protected]>

Add exchange bench on Criterion

85e07fb

Signed-off-by: Moritz Hoffmann <[email protected]>

Eagerly deallocating exchange pusher

6648092

Signed-off-by: Moritz Hoffmann <[email protected]>

Bench eagerly deallocating exchange pusher

9070235

Signed-off-by: Moritz Hoffmann <[email protected]>

experiments: cleanup unused exchange example

699b90a

Signed-off-by: Moritz Hoffmann <[email protected]>

exchange: Unify all exchange variants to generic impl

8ecced5

Instead of defining distinct exchange pushers, make the default generic and provide necessary hooks for specialized exchange pushers. Signed-off-by: Moritz Hoffmann <[email protected]>

pact: Remove LazyExhange, rename eager to NonRetainingExchange

66c1401

Signed-off-by: Moritz Hoffmann <[email protected]>

frankmcsherry approved these changes Nov 19, 2021

View reviewed changes

exchange: Improve comments

090e5e0

Signed-off-by: Moritz Hoffmann <[email protected]>

antiguru force-pushed the exchange_bench branch from 59b11f5 to 090e5e0 Compare November 20, 2021 01:12

exchange_bench: Input data as batch

cb02537

Signed-off-by: Moritz Hoffmann <[email protected]>

antiguru merged commit fa63c3a into TimelyDataflow:master Nov 20, 2021

antiguru deleted the exchange_bench branch November 20, 2021 01:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exchange: bench, do not preallocate buffers #416

Exchange: bench, do not preallocate buffers #416

antiguru commented Aug 18, 2021 •

edited

Loading

antiguru commented Aug 23, 2021

antiguru commented Aug 23, 2021

ryzhyk commented Sep 7, 2021

ryzhyk commented Sep 9, 2021

frankmcsherry left a comment

Exchange: bench, do not preallocate buffers #416

Exchange: bench, do not preallocate buffers #416

Conversation

antiguru commented Aug 18, 2021 • edited Loading

antiguru commented Aug 23, 2021

antiguru commented Aug 23, 2021

ryzhyk commented Sep 7, 2021

ryzhyk commented Sep 9, 2021

frankmcsherry left a comment

Choose a reason for hiding this comment

antiguru commented Aug 18, 2021 •

edited

Loading