client is a potential bottleneck #39

aglyzov · 2012-06-21T23:55:04Z

When testing one has to ensure the client machine is powerful enough to withstand a CPU load
created by the erlang client program.

On several occasions I saw the client to consume more CPU than a server on an identical pair
of machines. I observed the htop output of both the client and server machines at the same time
and it was clear that the erlang client was bounded by CPU while the server had a fair amount of
reserve. This was especially so in the first stage of the test when new connections get created.
Then, after some connections died off due to the client timeout, the client CPU usage lowered
considerably.

So, assuming the testing machines are the same, the client might be a bottleneck in some cases.
This needs to be checked thoroughly.

jlouis · 2012-06-22T08:49:18Z

That sounds interesting. It is also interesting because different servers still handle the connections differently. If the client was the sole problem, then all servers "able to keep up" should have roughly the same behaviour. From my initial tests on data for the handshake times only two systems exhibit the same behaviour: erlang-cowboy and go-websocket. The rest of the bunch have considerably different characteristics.

I agree this is worth investigating. I'll consider reading through the client code in order to try to figure out what it does and if I can find something odd in there, among other things.

aglyzov · 2012-06-22T10:09:49Z

@jlouis, notice, my systems were both single-core. Considering that a client does much more processing than a simplistic server, that what might caused the oddity. Also, I can confirm that almost all systems behave comparably on my tests. I should try to add another core to the client machine and try again. Thanks for the insight.

aglyzov · 2012-06-22T14:06:40Z

So guys, I added the second CPU core to my client machine, ran some tests and I now have interesting results for you.

First of all, I think my theory on the client being a bottleneck in some cases was right. Check out these screenshots to see what I mean:
client (2 CPU cores) on the left, server (1 CPU cores) on the right
java-webbit: https://dl.dropbox.com/u/4663634/websocket-test/java-webbit.png
pypy-twisted: https://dl.dropbox.com/u/4663634/websocket-test/twisted-pypy-1.png
pypy-tornado: https://dl.dropbox.com/u/4663634/websocket-test/tornado-pypy-1.png

Results:
https://dl.dropbox.com/u/4663634/websocket-test/websocket-test-results.txt

On a side note: Haskell and Go were unbelievably awful in terms of memory consumption. While it is a known fact that Go has severe memory problems on 32bit architectures due to the questionable GC design, I am surprised about the haskell-snap behavior.

jlouis · 2012-06-22T14:23:22Z

That definitely looks like an overload problem on your hardware. Also note you are not even getting the 10k handshakes which Eric was getting on go and Erlang - so the faster machine Eric has might help in this case at handling all the connectivity. Perhaps you should post specs as well as Eric, so we have an idea of what kind of machine is currently needed to handle the load.

As for the 32bit limit on the Go GC, it is the price they have to pay because their language does not use a precise GC (which is a rather bad decision IMO).

aglyzov · 2012-06-22T14:29:37Z

@jlouis, I am not sure about the overload now that I have added the second core to the client. At least it is not because of CPU now. Perhaps some other hidden price of virtualization. Indeed I am eager to find out the results on a real hardware.

Notice, by displaying the screenshots I tried to show that the erlang client was consuming more than 1 CPU to compete with certain fast servers.

ericmoritz · 2012-06-22T19:24:19Z

The server hardware that I have is the following:

AMD Phenom 9600 Quad Core - 2300 mhz
2GB of Memory

The client I will be using is my Macbook Pro bootcamped into Ubuntu 12.04 (at least that is the plan)

MBP's stats:

$ sysctl hw
hw.ncpu: 8
hw.byteorder: 1234
hw.memsize: 8589934592
hw.activecpu: 8
hw.physicalcpu: 4
hw.physicalcpu_max: 4
hw.logicalcpu: 8
hw.logicalcpu_max: 8
hw.cputype: 7
hw.cpusubtype: 4
hw.cpu64bit_capable: 1
hw.cpufamily: 1418770316
hw.cacheconfig: 8 2 2 8 0 0 0 0 0 0
hw.cachesize: 8589934592 32768 262144 6291456 0 0 0 0 0 0
hw.pagesize: 4096
hw.busfrequency: 100000000
hw.busfrequency_min: 100000000
hw.busfrequency_max: 100000000
hw.cpufrequency: 2200000000
hw.cpufrequency_min: 2200000000
hw.cpufrequency_max: 2200000000
hw.cachelinesize: 64
hw.l1icachesize: 32768
hw.l1dcachesize: 32768
hw.l2cachesize: 262144
hw.l3cachesize: 6291456
hw.tbfrequency: 1000000000
hw.packages: 1
hw.optional.floatingpoint: 1
hw.optional.mmx: 1
hw.optional.sse: 1
hw.optional.sse2: 1
hw.optional.sse3: 1
hw.optional.supplementalsse3: 1
hw.optional.sse4_1: 1
hw.optional.sse4_2: 1
hw.optional.x86_64: 1
hw.optional.aes: 1
hw.optional.avx1_0: 1
hw.cputhreadtype: 1
hw.machine = x86_64
hw.model = MacBookPro8,2
hw.ncpu = 8
hw.byteorder = 1234
hw.physmem = 2147483648
hw.usermem = 943783936
hw.pagesize = 4096
hw.epoch = 0
hw.vectorunit = 1
hw.busfrequency = 100000000
hw.cpufrequency = 2200000000
hw.cachelinesize = 64
hw.l1icachesize = 32768
hw.l1dcachesize = 32768
hw.l2settings = 1
hw.l2cachesize = 262144
hw.l3settings = 1
hw.l3cachesize = 6291456
hw.tbfrequency = 1000000000
hw.memsize = 8589934592
hw.availcpu = 8

ericmoritz · 2012-06-22T19:26:55Z

Sorry, the server only has 2GB of memory. I copy/pasted that from the Craigslist Ad. One of the 2GB modules were bad, so I removed it.

I may have to pick up a 1 or 2GB module if the OS + each server start swapping.

ericmoritz · 2012-06-22T19:30:02Z

Does anyone know if I should add a "cool down" period between stopping on server and starting the other? Could there be any residual effects of one test in the kernel that could affect the result of another test?

ericmoritz · 2012-06-22T19:31:57Z

To save you some googling, the server is 64bit.

aglyzov · 2012-06-22T19:41:00Z

Once all the processes have exited/killed it should be fine. A 15 sec pause to be on the safe side.

aglyzov · 2012-06-23T23:19:19Z

Update: I've been testing the servers on a pair of linode-512 machines. The outcome is this: a basic linode hardware is capable of handling ~19k of active concurrent connections (pypy,erlang,java).

It's like 1$ a month for a 1,000 of websockets :)

ericmoritz · 2012-06-24T04:41:14Z

I like how this thing is turning into a way to benchmark VPS hosts as well as individual WS implementations.

perone · 2012-07-02T22:23:17Z

@aglyzov what was the number of active concurrent connections on the linode for the other benchs like gevent for instance ?

aglyzov · 2012-07-03T04:44:34Z

@perone gevent-websocket was not doing great unfortunately. There was a cut-off near 11k.

perone · 2012-07-03T12:06:33Z

@aglyzov thanks for sharing !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client is a potential bottleneck #39

client is a potential bottleneck #39

aglyzov commented Jun 21, 2012

jlouis commented Jun 22, 2012

aglyzov commented Jun 22, 2012

aglyzov commented Jun 22, 2012

jlouis commented Jun 22, 2012

aglyzov commented Jun 22, 2012

ericmoritz commented Jun 22, 2012

ericmoritz commented Jun 22, 2012

ericmoritz commented Jun 22, 2012

ericmoritz commented Jun 22, 2012

aglyzov commented Jun 22, 2012

aglyzov commented Jun 23, 2012

ericmoritz commented Jun 24, 2012

perone commented Jul 2, 2012

aglyzov commented Jul 3, 2012

perone commented Jul 3, 2012

client is a potential bottleneck #39

client is a potential bottleneck #39

Comments

aglyzov commented Jun 21, 2012

jlouis commented Jun 22, 2012

aglyzov commented Jun 22, 2012

aglyzov commented Jun 22, 2012

jlouis commented Jun 22, 2012

aglyzov commented Jun 22, 2012

ericmoritz commented Jun 22, 2012

ericmoritz commented Jun 22, 2012

ericmoritz commented Jun 22, 2012

ericmoritz commented Jun 22, 2012

aglyzov commented Jun 22, 2012

aglyzov commented Jun 23, 2012

ericmoritz commented Jun 24, 2012

perone commented Jul 2, 2012

aglyzov commented Jul 3, 2012

perone commented Jul 3, 2012