Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App in Homey gets set to Paused status #38

Open
SiskoBen opened this issue Jun 19, 2023 · 4 comments
Open

App in Homey gets set to Paused status #38

SiskoBen opened this issue Jun 19, 2023 · 4 comments

Comments

@SiskoBen
Copy link

SiskoBen commented Jun 19, 2023

Describe the bug
The Prometheus.IO app is randomly being set to paused status which is only cleared by restarting the app

Diagnostics report ID
This was made yesterday but I'm not sure if it is of any use: c423a8f9-8a7a-4f0d-84b2-c665ee3a9b27
And another instance of it happening while I was creating this bug report: bebdbc25-2950-4d7e-a98c-da4aee3438a4
Another instance at just after 17:00 today: a99f6a78-2de3-4e0b-8e89-9b38f125ff8d
Again just after 20:00 even with scraping set to every 60 seconds (instead of 15 seconds): 6941bcf4-fc1e-49f0-a469-8c7182d7d944
And again just after 21:00 6ff6f11f-d0a3-4c53-8c28-65a172d1546b

Configuration
Hardware revision: "Homey Pro (Early 2019)".
Firmware version : 8.1.4

Additional context
I recently added 17 virtual devices to Homey (13 with just "measure power" capability and 4 others with "measure power" and "meter power" capability). Before that there were no apparent issues. Also I researched a bit an found somewhere that Homey does automatically sets apps in a paused state when said app used more than 80MB of memory.
The Prometheus.IO app is regularly using between 28 and 34MB of memory (while Athom/Homey mentions apps should not use more than 30MB) with occasional spikes to around 70MB but i could not find any occurrence in prometheus of the app using 80MB or more (possibly because the app is then on paused status and not sending it to the Prometheus server).
I already tried removing some unused apps and also removed some devices from Homey to see if that reduces the Prometheus.IO memory footprint but it does not appear to make any perceivable difference. I also have set now a Homey Flow to restart the Prometheus.IO app at 3:33 am but that does also not prevent the issue from happening it seems to be happening at random and the memory footprint does not seem to increase or decrease towards the time that it is stopped by Homey. The Prometheus server is querying Homey/Prometheus.io every 15 seconds.
I did find that the Grafana query "sum(rate(homey_exporter_self_time_seconds[$__rate_interval]))" shows a high spike a few seconds before the stop of Prometheus.IO of around 30 seconds while typically this would be no more than 3 seconds.
the following items seems to contribute the bulk of the spike "{action="updatedevicelist", device="_updatedevicelist", instance="192.168.34.161:9414", job="prometheus", name="_updatedevicelist", type="real", zone="_updatedevicelist", zones="_updatedevicelist"}", "{action="updatedevicelist", device="_updatedevicelist", instance="192.168.34.161:9414", job="prometheus", name="_updatedevicelist", type="system", zone="_updatedevicelist", zones="_updatedevicelist"}" and "{action="updatedevicelist", device="_updatedevicelist", instance="192.168.34.161:9414", job="prometheus", name="_updatedevicelist", type="user", zone="_updatedevicelist", zones="_updatedevicelist"}" apparently at about just after the whole hour mark (at least that is what Grafana shows [15-45 seconds after the whole hour]) these contribute a total of 26-28 seconds. Normally the spikes for this contribute about 5 seconds at just past the whole hour mark.

edit 19:14 CEST : I've now put the scraping interval of the Prometheus server to once every 60 seconds instead of once every 15 seconds. In the assumption that the multi second spikes take too long causing multiple scraping requests to queue up and increase memory usage until the cause of the spike has resolved.

edit 21:17 CEST: just after 20:00 CEST Prometheuse.IO was put to paused again even though the scraping interval was put to 60 seconds

edit 21:33 CEST: i have no created a flow that restarts Prometheus.IO every hour (with a delay of 4 minutes) so that at least if it gets paused just after the hour it gets restarted.

@rickardp
Copy link
Owner

Thanks for the thorough report!

I got your diagnostic reports but they are unfortunately empty. I suspect the logs are cleared when the app gets killed. I also don't get any automatic crash reports from when this happens.

It can of course be a resource quota like memory that is exceeded. I did not see a way to increase any resource request for the app. If the limit is hard coded, I guess it can be a bit harder to fix. Of course, optimizations might be possible.

If you check the historic memory usage, so you see any spike (in any app) before the pause?

@SiskoBen
Copy link
Author

I did some checking but I cannot find any other app to use so much memory. For now the workaround of restarting every hour is working (I'm missing just 0m45s-1m45s of data in Prometheus every hour) for me.
Do you know what it is that is using the bulk of memory in your app? In other words is there any point in me cutting down on devices to try to reduce the memory footprint?
Is there something particular at around the hour mark that the prometeus.io app is doing? In other words what could cause the memory footprint to rise so extreme (from around 30MB to over 2.5 times as much) as to occasionally trigger the Homey safeguard limit of 80MB?

@SiskoBen
Copy link
Author

Hi,
I've now changed my flow in Homey from restarting it every hour to start it only when it is not working anymore.
I did this by making a flow that gets triggered every minute and then checks whether a HTTP GET "http://192.168.34.161:9414/metrics" works with a "200" response (using the "HTTP request flow cards" [C20] https://homey.app/en-ie/app/com.internet/HTTP-request-flow-cards/ app) and only when there is no response of "200" the Prometheus.io app is restarted. I also send myself an e-mail in case the flow needed to restart the Prometeus.io app but so far (after 24h) it has not needed it to be done.
(maybe the additional request causes the app to keep awake/busy enough not to let the memory use go out of hand).
So for now the issue seems to be mitigated. I would not be offended if you close this issue due to lack of new debugging data.

@rickardp
Copy link
Owner

rickardp commented Aug 9, 2023

Last time I checked, most memory was spent inside the Homey API. It usually prints warnings about using excessive numbers of event listeners if having a lot of devices. I suspect the Homey API was not really built for subscribing to every single device in the same app. I think it can be given another shot given that the tooling and API has improved since I last attempted this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants