Question #140

Samistine · 2017-05-09T03:44:06Z

Would you say this would work well for a production product?
What would be a reasonable quantity of interactions from devices that this could handle on a modern computer.

jlkalberer · 2017-05-09T04:45:33Z

There are a few companies using this in production with a few hundred devices + some horizontal scaling.

We are actively working on #136 and #137 which should significantly improve the number of devices supported on a server.

cc: @wdimmit

wdimmit · 2017-05-09T22:11:08Z

Hi @Samistine

We are using this project in limited production at this point with between 80 and 100 active clients spread between three instances of this server. I believe that load could be supported by one instance, but I haven't reduced my load balancing lately to test.

The capacity limiting factor at this point seems to have more to do with the number of active clients and the amount of churn than the number of events raised. During preliminary testing, I had a small number (~20) devices producing multiple events per second each without any real issues.

My (very) simple load balancing solution can be found here: https://github.com/Ario-Inc/spark-routing-db. At present it requires that the "./spark-server/data/" folder be mounted on a file system shared by all instances.

Cheers,
Will

lilyannh · 2017-05-17T02:35:54Z

Hi @wdimmit

We're getting ready to launch into production in the next week, so your experience is super interesting to us right now :) What kind of symptoms do you see when the spark server is not able to handle load - like, does it get spotty handling requests or does the whole thing just go down? If you don't mind my asking, how 'beefy' is your server(s)?

The # of clients issue is definitely a concern. I'll have to take a look at your load balancer and maybe have that ready, just in case.

jlkalberer · 2017-05-17T03:10:11Z

@Snazzypants - how many devices? You might need to use more servers until we add in clustering.

wdimmit · 2017-05-17T15:28:15Z

@Snazzypants

When the server hits its upper limit (probably between 80-120 clients at the moment), it appears to stop registering some keep-alive pings from the clients. This leads to clients disconnecting somewhat randomly at multiples of the timeout interval (15s).

Also, if you dump a bunch of clients on a server at once (>20 or so), some percentage of them will disconnect at the first timeout interval. This means that if you add 50 clients to an empty server, perhaps 20 will disconnect, then those 20 will reconnect causing 10 to disconnect then things will stabilize out. The exact numbers here are estimates and from memory.

I'm currently running 3 device servers, each on a Azure single core VM. The load levels are essentially 0 - this is not a CPU bound process. I'm running the processes on separate VMs for redundancy, not extra processing power.

Finally, check out my code at these two points in the spark-protocol project to see how I'm communicating with my connection router:
https://github.com/Ario-Inc/spark-protocol/blob/master/src/clients/Device.js#L125
https://github.com/Ario-Inc/spark-protocol/blob/master/src/clients/Device.js#L314

lilyannh · 2017-05-19T22:23:32Z

@wdimmit Thanks for the info!

@jlkalberer we'll probably start out with ~75 in the first two weeks, then continue to add more as more people come online.

The clustering would require multiple CPUs, is that correct? Right now we are also on a single core. Seems like it would still be more cost efficient to set up multiple cheapo servers rather than cluster on a multi core machine?

jlkalberer · 2017-05-20T18:31:42Z

Well... there is a bug somewhere in our implementation I'm thinking. Your server should be able to handle way more than 80-120 clients without disconnects occurring.

What I think is happening is that somewhere we are blocking the thread which blocks the pings from the devices. With clustering I'm hoping that while the CPU is blocked, it switches to another thread and that will allow more devices to be connected even with a single core.

In the short term this is a quick fix but in the long term I'd love to figure out why the thread is blocking.

lilyannh · 2017-05-21T01:27:25Z

@wdimmit So I'm trying to set up your load balancing solution. I cloned over your fork of the spark-protocol, but am getting some an error when starting up spark-server. Any help would be appreciated...will definitely need to get your solution working in the next few days as we are shipping soon.

EDIT: sorry, ignore that...the error I was getting was related to some memory issues when installing packages. I'm made a bit more progress, am slowly getting there, I think...

(sorry to threadjack but I didn't see a place to open a separate issue in your spark-protocol fork)

jlkalberer closed this as completed May 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question #140

Question #140

Samistine commented May 9, 2017

jlkalberer commented May 9, 2017

wdimmit commented May 9, 2017

lilyannh commented May 17, 2017

jlkalberer commented May 17, 2017

wdimmit commented May 17, 2017

lilyannh commented May 19, 2017

jlkalberer commented May 20, 2017

lilyannh commented May 21, 2017 •

edited

Loading

Question #140

Question #140

Comments

Samistine commented May 9, 2017

jlkalberer commented May 9, 2017

wdimmit commented May 9, 2017

lilyannh commented May 17, 2017

jlkalberer commented May 17, 2017

wdimmit commented May 17, 2017

lilyannh commented May 19, 2017

jlkalberer commented May 20, 2017

lilyannh commented May 21, 2017 • edited Loading

lilyannh commented May 21, 2017 •

edited

Loading