Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc faster than UDS #10

Open
tony-clarke-amdocs opened this issue Oct 4, 2022 · 4 comments
Open

grpc faster than UDS #10

tony-clarke-amdocs opened this issue Oct 4, 2022 · 4 comments

Comments

@tony-clarke-amdocs
Copy link

When I run the benchmark test on my MacBook I observe that for 1MB payload GRPC is almost twice as fast as UDS. It was mentioned that "To get better performance in large sizes, I had to add some kernel buffer space over the defaults, which lead to close to double performance". Anyone know how to add "some kernel buffer space over the defaults"?

Results below:

2022/10/04 17:37:17 benchmark.go:119: Running tests for: uds
2022/10/04 17:37:17 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:1024, Alloc:false}
2022/10/04 17:37:18 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:10240, Alloc:false}
2022/10/04 17:37:20 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:102400, Alloc:false}
2022/10/04 17:37:22 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:1024000, Alloc:false}
2022/10/04 17:37:38 benchmark.go:122: Test Params: main.testParms{Concurrency:1, Amount:10000, PacketSize:1024, Alloc:true}
2022/10/04 17:37:42 benchmark.go:122: Test Params: main.testParms{Concurrency:1, Amount:10000, PacketSize:10240, Alloc:true}
2022/10/04 17:37:46 benchmark.go:122: Test Params: main.testParms{Concurrency:1, Amount:10000, PacketSize:102400, Alloc:true}
2022/10/04 17:37:57 benchmark.go:122: Test Params: main.testParms{Concurrency:1, Amount:10000, PacketSize:1024000, Alloc:true}

Test Results(uds):

[Speed]

[12 Users][10000 Requests][1.0 kB Bytes] - min 44.145µs/sec, max 1.626258ms/sec, avg 233.776µs/sec, rps 50624.37

[12 Users][10000 Requests][10 kB Bytes] - min 91.93µs/sec, max 1.305938ms/sec, avg 396.359µs/sec, rps 30119.14

[12 Users][10000 Requests][102 kB Bytes] - min 751.183µs/sec, max 7.131249ms/sec, avg 1.284381ms/sec, rps 9317.00

[12 Users][10000 Requests][1.0 MB Bytes] - min 8.975998ms/sec, max 183.318841ms/sec, avg 17.873009ms/sec, rps 670.42

[Allocs]

[10000 Requests][1.0 kB Bytes] - allocs 290,385

[10000 Requests][10 kB Bytes] - allocs 302,219

[10000 Requests][102 kB Bytes] - allocs 321,047

[10000 Requests][1.0 MB Bytes] - allocs 336,841

2022/10/04 17:38:59 benchmark.go:119: Running tests for: grpc
2022/10/04 17:38:59 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:1024, Alloc:false}
2022/10/04 17:39:01 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:10240, Alloc:false}
2022/10/04 17:39:02 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:102400, Alloc:false}
2022/10/04 17:39:04 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:1024000, Alloc:false}
2022/10/04 17:39:13 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:1024, Alloc:true}
2022/10/04 17:39:17 benchmark.go:122: Test Params: main.testParms{Concurrency:1, Amount:10000, PacketSize:10240, Alloc:true}
2022/10/04 17:39:22 benchmark.go:122: Test Params: main.testParms{Concurrency:1, Amount:10000, PacketSize:102400, Alloc:true}
2022/10/04 17:39:30 benchmark.go:122: Test Params: main.testParms{Concurrency:1, Amount:10000, PacketSize:1024000, Alloc:true}

Test Results(grpc):

[Speed]

[12 Users][10000 Requests][1.0 kB Bytes] - min 41.172µs/sec, max 3.198751ms/sec, avg 231.151µs/sec, rps 50555.10

[12 Users][10000 Requests][10 kB Bytes] - min 67.013µs/sec, max 3.159617ms/sec, avg 381.454µs/sec, rps 30822.76

[12 Users][10000 Requests][102 kB Bytes] - min 215.603µs/sec, max 5.728779ms/sec, avg 1.115849ms/sec, rps 10716.17

[12 Users][10000 Requests][1.0 MB Bytes] - min 3.849356ms/sec, max 17.255924ms/sec, avg 9.977688ms/sec, rps 1201.41

[Allocs]

[10000 Requests][1.0 kB Bytes] - allocs 1,518,299

[10000 Requests][10 kB Bytes] - allocs 1,701,021

[10000 Requests][102 kB Bytes] - allocs 1,838,238

[10000 Requests][1.0 MB Bytes] - allocs 2,036,568

@johnsiilver
Copy link
Owner

Wanted to leave a note - I'm currently traveling so it will be bit before I get a chance to look at this. Multiple possibilities including new Go versions, OS changes, etc....

As for the "kernel buffer space over the defaults" , I vaguely remember this being some setting you can make on how much buffer that a Unix socket gets. Its a terrible note since I didn't clarify what the command was to change it. I'd smack my 2020 self in the head if I could.

@johnsiilver
Copy link
Owner

Well, apparently I can't sleep so I decided to run the benchmarks from my machine.

Now this Mac is newer than the one in the docs that I ran the published benchmarks:

2021 Macbook Pro M1 Max 64GB
OSX 12.6
Go version 1.19 darwin/arm64

To be clear, I'm running benchmark: ipc/uds/highlevel/proto/rpc/benchmark

In this case, I didn't make any change to the IPC buffer. I think what I had been adjusting before was the IPC buffer size, by default it is set to: kern.ipc.maxsockbuf: 8388608

I can change this with sysctl -w kern.ipc.maxsockbuf=[number] and view what it currently is with sysctl kern.ipc.maxsockbuf

Here's the results:

Test Results(uds):

[Speed]

[10 Users][10000 Requests][1.0 kB Bytes] - min 12.709µs/sec, max 19.393583ms/sec, avg 157.727µs/sec, rps 61861.73

[10 Users][10000 Requests][10 kB Bytes] - min 17.75µs/sec, max 14.697584ms/sec, avg 300.277µs/sec, rps 32853.74

[10 Users][10000 Requests][102 kB Bytes] - min 88.666µs/sec, max 11.183584ms/sec, avg 615.613µs/sec, rps 16175.53

[10 Users][10000 Requests][1.0 MB Bytes] - min 1.343291ms/sec, max 24.624583ms/sec, avg 3.244991ms/sec, rps 3077.23

[Allocs]

[10000 Requests][1.0 kB Bytes] - allocs 290,623

[10000 Requests][10 kB Bytes] - allocs 302,992

[10000 Requests][102 kB Bytes] - allocs 321,978

[10000 Requests][1.0 MB Bytes] - allocs 334,421

Test Results(grpc):

[Speed]

[10 Users][10000 Requests][1.0 kB Bytes] - min 33.458µs/sec, max 3.61625ms/sec, avg 192.079µs/sec, rps 50615.82

[10 Users][10000 Requests][10 kB Bytes] - min 74.167µs/sec, max 5.155416ms/sec, avg 470.981µs/sec, rps 21074.70

[10 Users][10000 Requests][102 kB Bytes] - min 1.24125ms/sec, max 7.500666ms/sec, avg 2.412776ms/sec, rps 4140.90

[10 Users][10000 Requests][1.0 MB Bytes] - min 11.356542ms/sec, max 47.805ms/sec, avg 19.213821ms/sec, rps 520.33

[Allocs]

[10000 Requests][1.0 kB Bytes] - allocs 1,524,532

[10000 Requests][10 kB Bytes] - allocs 1,691,127

[10000 Requests][102 kB Bytes] - allocs 1,958,763

[10000 Requests][1.0 MB Bytes] - allocs 3,262,682

I have UDS beating GRPC by 5.9x in speed in the 1MB category. So I'm not sure what the discrepancy is with your run of the test.

I adjusted the kern.ipc.maxsockbuf, doubling it and then quadrupling it, but funny enough this only had significant effects on the lower end . It might be that I was referring to much larger buffer sizes that I had tested but removed from the benchmark. I really wish I had been clearer there.....

It might be with more information I'll be able to figure out what the difference is, but with what I've got at the moment I'm not sure what ths issue is.

@tony-clarke-amdocs
Copy link
Author

Thanks for taking time out of your night to take a look

I think my current Mac is similar to your prior Mac. I noticed that you are now running OS Monterey. Based on that I decided to upgrade. After the upgrade I see slightly different results:

Test Results(uds):

[Speed]

[12 Users][10000 Requests][1.0 kB Bytes] - min 46.394µs/sec, max 1.356804ms/sec, avg 214.1µs/sec, rps 55233.59

[12 Users][10000 Requests][10 kB Bytes] - min 75.537µs/sec, max 2.037849ms/sec, avg 391.558µs/sec, rps 30499.86

[12 Users][10000 Requests][102 kB Bytes] - min 706.411µs/sec, max 4.307707ms/sec, avg 1.29047ms/sec, rps 9272.03

[12 Users][10000 Requests][1.0 MB Bytes] - min 7.306673ms/sec, max 39.574033ms/sec, avg 14.377907ms/sec, rps 834.06

[Allocs]

[10000 Requests][1.0 kB Bytes] - allocs 290,383

[10000 Requests][10 kB Bytes] - allocs 302,228

[10000 Requests][102 kB Bytes] - allocs 321,473

[10000 Requests][1.0 MB Bytes] - allocs 338,836

Test Results(grpc):

[Speed]

[12 Users][10000 Requests][1.0 kB Bytes] - min 41.153µs/sec, max 1.780725ms/sec, avg 224.803µs/sec, rps 52141.44

[12 Users][10000 Requests][10 kB Bytes] - min 102.856µs/sec, max 3.374822ms/sec, avg 523.397µs/sec, rps 22839.44

[12 Users][10000 Requests][102 kB Bytes] - min 1.187308ms/sec, max 7.811926ms/sec, avg 2.588879ms/sec, rps 4632.78

[12 Users][10000 Requests][1.0 MB Bytes] - min 11.411182ms/sec, max 33.643369ms/sec, avg 21.227748ms/sec, rps 565.16

[Allocs]

[10000 Requests][1.0 kB Bytes] - allocs 1,525,499

[10000 Requests][10 kB Bytes] - allocs 1,691,013

[10000 Requests][102 kB Bytes] - allocs 1,957,932

[10000 Requests][1.0 MB Bytes] - allocs 3,270,446

Now UDS is slightly faster than grpc. Increasing kern.ipc.maxsockbuf doesn't seem to make much of a difference. But if I set net.local.stream.recvspace=1280000 and net.local.stream.sendspace=1280000 and rerun I get the following:

==========================================================================
[Speed]

[12 Users][10000 Requests][1.0 kB Bytes] - min 37.969µs/sec, max 1.447162ms/sec, avg 203.445µs/sec, rps 58190.32

[12 Users][10000 Requests][10 kB Bytes] - min 61.718µs/sec, max 1.478607ms/sec, avg 392.791µs/sec, rps 30416.75

[12 Users][10000 Requests][102 kB Bytes] - min 667.759µs/sec, max 6.745718ms/sec, avg 1.258853ms/sec, rps 9504.93

[12 Users][10000 Requests][1.0 MB Bytes] - min 7.148514ms/sec, max 40.472526ms/sec, avg 14.250775ms/sec, rps 841.51

[Allocs]

[10000 Requests][1.0 kB Bytes] - allocs 290,385

[10000 Requests][10 kB Bytes] - allocs 302,219

[10000 Requests][102 kB Bytes] - allocs 321,025

[10000 Requests][1.0 MB Bytes] - allocs 338,544

Test Results(grpc):

[Speed]

[12 Users][10000 Requests][1.0 kB Bytes] - min 51.922µs/sec, max 2.425716ms/sec, avg 231.266µs/sec, rps 50616.78

[12 Users][10000 Requests][10 kB Bytes] - min 73.416µs/sec, max 2.124342ms/sec, avg 365.51µs/sec, rps 32077.50

[12 Users][10000 Requests][102 kB Bytes] - min 217.308µs/sec, max 5.464793ms/sec, avg 1.005719ms/sec, rps 11883.91

[12 Users][10000 Requests][1.0 MB Bytes] - min 5.468018ms/sec, max 15.044432ms/sec, avg 8.622932ms/sec, rps 1390.03

[Allocs]

[10000 Requests][1.0 kB Bytes] - allocs 1,518,328

[10000 Requests][10 kB Bytes] - allocs 1,701,023

[10000 Requests][102 kB Bytes] - allocs 1,913,110

[10000 Requests][1.0 MB Bytes] - allocs 1,941,978

Now, GRPC is faster again. I don't know if others will get the same boost in GRPC by setting net.local.stream.recvspace and net.local.stream.sendspace.

@johnsiilver
Copy link
Owner

johnsiilver commented Oct 7, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants