Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question #140

Closed
Samistine opened this issue May 9, 2017 · 8 comments
Closed

Question #140

Samistine opened this issue May 9, 2017 · 8 comments

Comments

@Samistine
Copy link

Would you say this would work well for a production product?
What would be a reasonable quantity of interactions from devices that this could handle on a modern computer.

@jlkalberer
Copy link

There are a few companies using this in production with a few hundred devices + some horizontal scaling.

We are actively working on #136 and #137 which should significantly improve the number of devices supported on a server.

cc: @wdimmit

@wdimmit
Copy link

wdimmit commented May 9, 2017

Hi @Samistine

We are using this project in limited production at this point with between 80 and 100 active clients spread between three instances of this server. I believe that load could be supported by one instance, but I haven't reduced my load balancing lately to test.

The capacity limiting factor at this point seems to have more to do with the number of active clients and the amount of churn than the number of events raised. During preliminary testing, I had a small number (~20) devices producing multiple events per second each without any real issues.

My (very) simple load balancing solution can be found here: https://github.com/Ario-Inc/spark-routing-db. At present it requires that the "./spark-server/data/" folder be mounted on a file system shared by all instances.

Cheers,
Will

@lilyannh
Copy link

Hi @wdimmit

We're getting ready to launch into production in the next week, so your experience is super interesting to us right now :) What kind of symptoms do you see when the spark server is not able to handle load - like, does it get spotty handling requests or does the whole thing just go down? If you don't mind my asking, how 'beefy' is your server(s)?

The # of clients issue is definitely a concern. I'll have to take a look at your load balancer and maybe have that ready, just in case.

@jlkalberer
Copy link

@Snazzypants - how many devices? You might need to use more servers until we add in clustering.

@wdimmit
Copy link

wdimmit commented May 17, 2017

@Snazzypants

When the server hits its upper limit (probably between 80-120 clients at the moment), it appears to stop registering some keep-alive pings from the clients. This leads to clients disconnecting somewhat randomly at multiples of the timeout interval (15s).

Also, if you dump a bunch of clients on a server at once (>20 or so), some percentage of them will disconnect at the first timeout interval. This means that if you add 50 clients to an empty server, perhaps 20 will disconnect, then those 20 will reconnect causing 10 to disconnect then things will stabilize out. The exact numbers here are estimates and from memory.

I'm currently running 3 device servers, each on a Azure single core VM. The load levels are essentially 0 - this is not a CPU bound process. I'm running the processes on separate VMs for redundancy, not extra processing power.

Finally, check out my code at these two points in the spark-protocol project to see how I'm communicating with my connection router:
https://github.com/Ario-Inc/spark-protocol/blob/master/src/clients/Device.js#L125
https://github.com/Ario-Inc/spark-protocol/blob/master/src/clients/Device.js#L314

@lilyannh
Copy link

@wdimmit Thanks for the info!

@jlkalberer we'll probably start out with ~75 in the first two weeks, then continue to add more as more people come online.

The clustering would require multiple CPUs, is that correct? Right now we are also on a single core. Seems like it would still be more cost efficient to set up multiple cheapo servers rather than cluster on a multi core machine?

@jlkalberer
Copy link

Well... there is a bug somewhere in our implementation I'm thinking. Your server should be able to handle way more than 80-120 clients without disconnects occurring.

What I think is happening is that somewhere we are blocking the thread which blocks the pings from the devices. With clustering I'm hoping that while the CPU is blocked, it switches to another thread and that will allow more devices to be connected even with a single core.

In the short term this is a quick fix but in the long term I'd love to figure out why the thread is blocking.

@lilyannh
Copy link

lilyannh commented May 21, 2017

@wdimmit So I'm trying to set up your load balancing solution. I cloned over your fork of the spark-protocol, but am getting some an error when starting up spark-server. Any help would be appreciated...will definitely need to get your solution working in the next few days as we are shipping soon.

EDIT: sorry, ignore that...the error I was getting was related to some memory issues when installing packages. I'm made a bit more progress, am slowly getting there, I think...

(sorry to threadjack but I didn't see a place to open a separate issue in your spark-protocol fork)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants