-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stops Receiving messages but can still send #253
Comments
A bit more behaviour information on above fault. Appears that redeploying a Receive or Command node does not restart the flow working, but redeploy of the BOT node does get the flow working again. I do get an error when I redeploy the bot, in the debug window of
This node id is for the Telegram BOT node (I have deliberately xxx out the NGROK tunnel ID for security) |
in version 11.5.0 timeout is set to 10s. |
OK thanks, I has updated to new version and restarted the RasPi. Will monitor for next few weeks to see if it stops again and report back. |
version 11.5.0 did not fix the problem, The Telegram Bot had stopped receiving again this morning when I tried. Again it is around a week after last redeploy of the Flow sheet containing the Telegram Bots Node. |
Hm ... these kind of errors are only reported when running on a raspi. No idea why but probably the network drivers behave a little bit differently. Can you maybe try to reduce the polling interval and test again? |
Some explanations about your findings: |
Karl, we are on a Starlink connection that works very well but does still
have odd outages (every few days) of up to a few minutes mostly due to
Sattellite availability but occasionally we also get longer outage due
weather (as example 30 minutes when a huge thunderstorm passed between us
and Ground station)
So outages like this are common on radio based links (we have No wired
connections). So what can automatically be done to detect that polling has
stopped so when we detect the link is restored we can automatically restart
polling.
I use this to remotely open and close secure tunnels from this system to me
(somewhere in the internet). They must be initiated from the Pi as we are
behind a CGNAT so dynamic IP doesn't work. So if I have no Tunnel open, I
have no way to restart NR.
Ta
…On Wed, 20 Jul. 2022, 5:00 am Karl-Heinz Wind, ***@***.***> wrote:
Some explanations about your findings:
the configuration node that holds the token is responsible for polling
messages from the TG server. If polling is disturbed or stopped at all for
some reason, then the TG server stores the messages in a buffer. If you
redeploy then polling is started again and then you get all messages from
the server that were not fetched so far.
The sender node is not affected by this as it sends the data via a
standard http message. If the network is down, this message is lost and you
would get an error.
So the question is: what blocks or kills the polling on your raspi.
—
Reply to this email directly, view it on GitHub
<#253 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGDP7BX73VFFGAC7H7MLP7DVU3647ANCNFSM5Y46W4MA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Reduced polling to 3000, and still stopped after a few days. Have just reduced to 5000 and will see. |
Went 21 days with polling set at 5000. Not really the fix then. If the issue is node stops polling we need a m3ans to restart it automatically without a redeploy which is not practical where the node is remote. |
I observe the same problem |
I could try to implement a kind of "autorestart after x minutes".... but I am not sure if this will help |
@WombatHollow could you please turn on verbose logging in the node? |
I have upgrade Raspi to Bullseye and NR to 3.0.2 about 14 days ago so are no monitoring Telegram for drop outs. Non so far but there has been work on flows resulting Flows or NR being restarted. Will continue to monitor and if I can get a month continous with no problems then it may be fixed by NR 3.0.2, bullseye . Also just updating this node to 14.3.0 |
OK, bumped it from 1000 to 2000. Will watch in for next few weeks when in
internet range.
Ian
…On Mon, 18 July 2022, 05:05 Karl-Heinz Wind, ***@***.***> wrote:
Hm ... these kind of errors are only reported when running on a raspi. No
idea why but probably the network drivers behave a little bit differently.
Can you maybe try to reduce the polling interval and test again?
—
Reply to this email directly, view it on GitHub
<#253 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGDP7BSDHKE735TQG3DO553VURN7TANCNFSM5Y46W4MA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
OK the fault occured again overnight. Here is a extract from Logs, NB the RBE error is where the ping of 8.8.8.8 and 198.142.152.163 (aka why two warn) doesn't respond with a number aka no internet which looks like two periods of 30 seconds (correlates with what Starlink reports two outages of 3 seconds). Also when I restart the Telegram bot (I change polling interval and restart flow), this error was reported... although didnt stop normal operation being restored. |
I will setup a raspi to be able to reproduce that problem.... |
@WombatHollow |
Thanks, just need a way for NR to detect the BOT has stopped polling (a
catch doesn't seem to report this!).
What I will do is get NR to restart the Bot each night, that way I will
only loose my ability to send commands for rest of that day should it stop.
Ian
…On Sun, 16 Oct 2022, 04:10 Karl-Heinz Wind, ***@***.***> wrote:
@WombatHollow <https://github.com/WombatHollow>
I added a new node that allows you to stop and start the bot from within
your flow.
Maybe you can use this in your flow whenever you detect that the network
is down.
In that case you can stop the bot and if network is available you can call
start.
—
Reply to this email directly, view it on GitHub
<#253 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGDP7BQE6ELTNFBKO4O6KE3WDLQRDANCNFSM5Y46W4MA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
If you want to, you can even restart every few minutes... should not be a problem |
Just to find out if it would really work |
@WombatHollow |
@WombatHollow maybe this can be used |
Just noticed, the Telegram Control node continues to report every 10 seconds even though polling set to 1000 mSec |
Yes I wondered, too. However a long poll times out after 10s when no new messages are available. You can test it when you send something to the bot it returns immediately |
@WombatHollow I added a second output to the control to check isOnline |
I would wind that update back. I installed it and Ran OK, but when I enabled the 2nd output, NR started restarting every minute, Disabled 2nd output and redeployed and restarts went away but thereis a red triangle on the control Node stating invalid properties for -interval and -timeout. |
Re original problem, it occured again a few nights ago but the little detection circuit detected and restarted the Bot. The Log had more data in it so I have attached it. Restart of Bot was at 20 Oct@ 2:54 AM |
What URL did you use? |
I reworked the offline and 2nd output code... please try again next version 14.8.6 |
I was using the default URL. But I did type in interval and timout (defaults). Will try new version later today. |
THe problem still exist. Here is LOG 23 Oct 12:34:03 - [red] Uncaught Exception: |
Can you send me your flow so that I can reproduce it? |
As below but with the second output of the telegram control Node enabled |
@WombatHollow thanks for the flow, I guess I found a bug in the configuration of the control node. |
OK, Control node now appears to work OK. |
Logs from last night, had a spell around 4PM, but the Telegram BOT 'EFATAL: Error: read ECONNRESET' but didn't loose it's polling of Telegram, nor was a restart of the Bot triggered by the flow (see above). Starlink modem showed only 35 seconds of internet connection downtime overnight. Node-red flows (Ping) only detected one instance of internet ping timeout (at 26 Oct 03:57:37, the RBE warning is because ping returned a non number). `26 Oct 03:52:16 - [warn] [telegram bot:1637792c.79a397] EFATAL: Error: read ECONNRESET ` |
@WombatHollow thanks, The errors you can see in the log are generated by the polling loop inside the bot lib. So during last night the bot did not crash, is that correct? |
No, botmwas still working in the morning. My time out detector was not
triggered (as it wait for longer), so I guess your restart inside control
node restore polling first.
Wasn't aware you had instigated restart function, thanks for that.
Will monitor and when happy remove my extra restart functionality.
…On Wed, 26 Oct 2022, 20:02 Karl-Heinz Wind, ***@***.***> wrote:
@WombatHollow <https://github.com/WombatHollow> thanks,
so the bot continued polling which is good.
The errors you can see in the log are generated by the polling loop inside
the bot lib.
The offline detection is some loop that runs next to the polling loop as a
kind of watch dog.
This one triggers if the first locks up.
So during last night the bot did not crash, is that correct?
—
Reply to this email directly, view it on GitHub
<#253 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGDP7BST4UHYB5G2AQN6OWDWFDXRLANCNFSM5Y46W4MA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@WombatHollow thanks for your detailed report. The one thing that is really strange: |
I guess that I must handle this ECONNRESET internally. Maybe I can catch that somewhere and try to reestablish the connection automatically... |
I do see a few other Errors in the log, they might be related, this one was EAI_AGAIN |
Yes the logs are ceeated when there is a problem. But I guess only the error indicating connection reset causes the problems you described |
@WombatHollow Error: getaddrinfo EAI_AGAIN api.telegram.org ECONNRESET means that the node called the server but while waiting for the response the network was lost ETIMEDOUT means that the network was ok but the response was not received in time which could indicate that the network was ok but either the telegram server does not answer or the network is broken far away ... Indeed the node creates a longpoll of 10s which means that the GET call to telegram is blocked for max of 10 seconds on the server side. If messages arrive at the TG server then the response is created within that 10s or a timeout is created without returning any messages ... the node will then continue polling after a while with the next long poll. |
@WombatHollow [{"id":"a7eb189d.f5e538","type":"comment","z":"759ef5a1.faa53c","name":"Warm reboot","info":"","x":330,"y":120,"wires":[]},{"id":"a06415d6.7ed3c8","type":"function","z":"759ef5a1.faa53c","name":"Request Token","func":"msg.payload = {\n "client_id": "node-red-editor",\n "grant_type": "password",\n "scope": "*",\n "username": "engineer",\n "password": "Pa$$w0rd"\n}\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","x":340,"y":170,"wires":[["47aa4d07.5bae94"]]},{"id":"649f96c6.bc9128","type":"inject","z":"759ef5a1.faa53c","name":"Manual reload","repeat":"","crontab":"","once":false,"onceDelay":"","topic":"","payload":"","payloadType":"str","x":130,"y":170,"wires":[["a06415d6.7ed3c8"]]},{"id":"47aa4d07.5bae94","type":"http request","z":"759ef5a1.faa53c","name":"Token","method":"POST","ret":"txt","paytoqs":"ignore","url":"http://localhost:1880/auth/token","tls":"","persist":false,"proxy":"","authType":"","credentials":{},"x":490,"y":170,"wires":[["d69beb8b.7008a8"]],"info":"Note:\nEnsure the path is correct\nif httpAdminRoot: '/admin' is activated then \nthis needs to be added to\nhttp://localhost:1880/"},{"id":"d69beb8b.7008a8","type":"function","z":"759ef5a1.faa53c","name":"Confirm token","func":"// get the status of the request\nvar status = msg.statusCode;\n\nvar token = '';\nmsg.headers ={};\n\n//let node = feedback;\n\nswitch(status){\n case 200:\n node.log("Secure restart");\n token = JSON.parse(msg.payload);\n token = 'Bearer '+token.access_token;\n msg.headers = {\n "Authorization": token,\n "Node-RED-Deployment-Type":"reload"\n }\n//msg.payload ="";\n break;\n case 204:\n node.log("Secure without restart");\n break;\n case 400:\n node.warn("Bad request");\n break;\n case 401:\n node.warn("Not authorized");\n break;\n case 403:\n node.warn("Forbidden");\n break;\n case 404:\n node.log("Unsecure restart");\n msg.headers = {\n "Node-RED-Deployment-Type":"reload"\n }\n break;\n case 409:\n node.warn("Version mismatch");\n break;\n case 500:\n node.error("Server Error");\n break;\n default:\n node.warn("Unknown Error");\n break;\n}\n\nmsg.payload = "";\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","x":640,"y":170,"wires":[["ccd455c1.f34318"]],"info":"Restart of Node-Red flows.\nWill check if the action needs security or not.\nNote: if the first 5 attemps return a statuscode 403 'forbidden'\nthen the server will break and only way to recover is then to\nrestart the service"},{"id":"ccd455c1.f34318","type":"http request","z":"759ef5a1.faa53c","name":"Restart","method":"POST","ret":"txt","paytoqs":"ignore","url":"http://localhost:1880/flows","tls":"","persist":false,"proxy":"","authType":"","credentials":{},"x":800,"y":170,"wires":[[]],"info":"Note:\nEnsure the path is correct\nif httpAdminRoot: '/admin' is activated then \nthis needs to be added to\nhttp://localhost:1880/"}] |
Well suddenly this Bot has gone from frequent polling errors as above, and losing 5% of sent messages.
I then had a look at the FLOW JSON at this point, and from 3 months ago for differences and found 3
I haven't yet tried to update node to v20 again (see issue #343 ) |
NR 2.2.2
Node-red-contrib-telegrambot 11.3.0
Raspbian on a Raspberry Pi 3B
The Node works well initially using Telegram Sender, Telegram Receiver and Telegram Command nodes but after a few days the receiver and Command Nodes stop producing any output to telegram messages but the Sender Node continues to function normally. Originally I restarted the RasPi (as I could do that remotely easily) to recover functionality but also recovered it if I tweak the Telegram Receiver node then just deployed modified Nodes the Telegram Receiver and Command nodes all start working again. When redeployed, both the Receiver and Command Nodes produce an output for the telegram messages sent before redeployment that were not processed then. this is like they are stuck in a queue and redeploy of a Receiver node has un stuck the queue.
Problem has existed for last 12 to 18 months (approx) since I started using the Telegram Nodes
Flow consists of a Sender node, a Receiver Node and 6 Command Nodes that all work initially. There is a Event Node which does nothing (still trying to workout what it does)
Here is the NR JSON for the bot, Receiver and one Command Node
[{"id":"f38dbfb9.7221d","type":"telegram command","z":"8689ea2a.fc96b8","name":"","command":"^\\/[Nn][Rr]$","bot":"1637792c.79a397","strict":false,"hasresponse":true,"useregex":true,"removeregexcommand":true,"outputs":2,"x":80,"y":560,"wires":[["14c19955.dc6347","2e389571.edf8ba"],["317ccc1d.ee1234"]]},{"id":"68541946.e07118","type":"telegram receiver","z":"8689ea2a.fc96b8","name":"","bot":"1637792c.79a397","saveDataDir":"","filterCommands":false,"x":130,"y":260,"wires":[["ce8beed4.49563","3c8ebbf9.cc6314"],["15f00161.0ee8bf","d83f843126120f0b"]]},{"id":"1637792c.79a397","type":"telegram bot","botname":"WombatHollow_bot","usernames":"IanHarrison","chatids":"-574707818","baseapiurl":"","updatemode":"polling","pollinterval":"1000","usesocks":false,"sockshost":"","socksport":"6667","socksusername":"anonymous","sockspassword":"","bothost":"","botpath":"","localbotport":"8443","publicbotport":"8443","privatekey":"","certificate":"","useselfsignedcertificate":false,"sslterminated":false,"verboselogging":false}]
The text was updated successfully, but these errors were encountered: