-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better Error Catching Due to Perf/Env Failures #23
Comments
This is due to a default timeout of 60 seconds defined here: https://github.com/s-nomp/node-stratum-pool/blob/master/lib/daemon.js#L55 It is actually a good thing that you are seeing these errors (compared to not seeing them). It means there is an issue with the node that needs to be resolved. Allowing overly-long rpc calls to be hidden is hiding a serious problem, leading to your pool finding less blocks among other issues. I don't think a 60 second timeout is unreasonable, especially when you're talking about mining -- every millisecond counts. First, check the size of your |
@egyptianbman It's fine we're seeing the errors, which is good. My issue is that the entire pool crashes about about 30 seconds of these spammed errors. I'll post more about it if I get time to test. I can recreate this by killing the first DNS server set inside Linux. Expected behavior is for s-nomp to spit out errors about socket hangups and then retry. The current behavior of this is infinite socket hangups until the pool is restarted manually. This also causes the pool forks to crash. I think network hangups (even if the daemon is just hung) can be handled a bit better to include self-healing. |
The infinite retries could be caused by pm2 if you're using that. We definitely need more information about which calls are failing. Depending on the call that's failing, the resolution could be completely different from another. |
@egyptianbman I'll try it out without PM2 to see if its an issue with that. Thanks for the opinion. 😄 |
So I've run into this with my ZEN node. It seems if one getblocktemplate call delays past 60 seconds, it goes into a constant try-and-fail loop. My suspicion is that these getblocktemplate calls are building on top of themselves so I'm working on some code to allow this call to be longer than usual. I'm testing right now to try to find the sweet spot limit. |
Most of the time when node-startum-pool has a slight lag with RPC from the daemons due to performance of hardware/software, network latency, etc... the whole pool crashes or the pool stops listening due to a very quick timeout.
I couldn't imagine if chrome crashed every time facebook wouldn't load the whole way for somebody. lol.
E.G.
DNS use case to talk to from node-stratum-pool to daemons. If main DNS server times out linux by default waits 5 seconds (crazy long I know) to start using the second DNS server. s-nomp will give a socket hangup almost instantly if it doesn't respond. Now this is good but the connection doesn't timeout on node-stratum's side and it retries the socket indefinitely. DNS is used in scaled setups.. being forceful on IP is something I'm against. Now I've worked around this by setting the DNS timeout to 500ms.
Another scenario from someone else.
from jacko0088.
Now my solution is to force a reconnect to the daemon after N socket hangups set by the end user or statically coded and a timeout_connect option.
The text was updated successfully, but these errors were encountered: