-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oximeter
collector binary has surprisingly large heap usage after overnight instance creation loop
#3808
Comments
I'm not actually that surprised about this one, though it's unfortunate. The design of The proper solution here is to finally implement de-registration of metric producers in Nexus. E.g., when an instance is destroyed, |
@bnaecker WDYT about having something defensive from the Oximeter side too -- namely, if a producer can't be queried after a certain amount of time, we stop trying to collect from it? |
That might be OK, though I'm a bit nervous about (1) picking a duration, and (2) getting |
I'm opting to close this since (1) we understand the cause (never removing a producer-collector assignment) and (2) it will not get worse once #4495 is merged, in the absence of the edge-case race noted in that PR thread. I will be including schema updates that should mitigate the issue on existing deployments in the short term in a coming PR. |
Repro steps:
Observed: A brand-new
oximeter
process's heap usage is about 8 MiB perpmap
. After the overnight run this has ballooned to ~2.1 GiB.Expected: Oximeter's heap usage remains relatively modest.
I don't have a good theory on this one. The next step is likely to find or cook up a DTrace script that'll let us find the culprit stacks.
The text was updated successfully, but these errors were encountered: