Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in OxalisAlgorithmSuiteLoader #120

Closed
jonassss opened this issue Apr 23, 2020 · 6 comments
Closed

Memory leak in OxalisAlgorithmSuiteLoader #120

jonassss opened this issue Apr 23, 2020 · 6 comments
Labels
bug Something isn't working Pending PR Fixed, waiting for Pull Request
Milestone

Comments

@jonassss
Copy link

jonassss commented Apr 23, 2020

Hi,

We pulled the newest commit to potentially upgrade our solution. (commit: cf55f82)
However, during stress testing the solution eventually crashed.
The test consists of sending a large amount of equal requests.

We also tried stopping the test before the solution crashed and observing the memory usage of the application over time. Expected behavior would be that the consumed memory of the application would drop after a period of inactivity, which we are not observing.

Screenshot 2020-04-23 at 10 59 21

Based on the two tests, we believe that there is a memory leak in the objects contained in the OxalisAlgorithmSuiteLoader.BUS_MAP.
We took a heap dump of the application after 12 hours of inactivity, the results of this object is as follows:

no.difi.oxalis.as4.util.OxalisAlgorithmSuiteLoader.BUS_MAP - 592,130K
    org.apache.cxf.bus.extension.ExtensionManagerBus.extensions - 499,161K
          java.util.concurrent.ConcurrentHashMap.values - 489,030K
                org.apache.cxf.ws.policy.PolicyInterceptorProviderRegistryImpl.entries - 241,036K
                org.apache.cxf.ws.policy.AssertionBuilderRegistryImpl.registeredBuilders - 137,869K
                org.apache.cxf.wsdl11.WSDLManagerImpl.registry - 62,368K
                (100 more references totalling to 47,750K)
     org.apache.cxf.bus.extension.ExtensionManagerBus.extensionManager - 86,956K
    (9 more references totalling to 6011K)

Note that we could be wrong in our assumption of the memory leak.
In any case, the application does not seem to recover from large amounts of requests - meaning we cannot introduce the newest version to production.

@FrodeBjerkholt FrodeBjerkholt added the bug Something isn't working label Apr 23, 2020
@FrodeBjerkholt FrodeBjerkholt modified the milestones: 4.1.9, 4.1.10 Apr 23, 2020
@FrodeBjerkholt
Copy link
Contributor

Thanks for the thorough bug report - I will look into it. A possible solution is probably to replace the hashmap with a cache.

@jonassss
Copy link
Author

jonassss commented Apr 23, 2020

Forgot to mention that during tests, we saw increasingly higher response times.
The response times where the same even hours after we stopped the tests.

To me, it looks like the assertion data from "old" transactions is not being garbage collected.
My knowledge is limited here, but this does look like an underlying issue of the Apache cfx library.

Edit:
On closer inspection it looks like the map is arbitrary.
It is never used other than to check if the code for setting the value should be executed.
Therefore it will hold a reference to an object which is never actually retrieved from the map - which means you could achieve the same result by using a set and checking if it contains the key.

However, when inspecting the logs we see that the key is actually never matched. Meaning that caching id's to avoid duplicate registrations is unnecessary (at least in our use case).

@FrodeBjerkholt
Copy link
Contributor

I have now made a hotfix-4.1.10 branch with a fix for this issue. I have also bumped CXF to 3.3.6 and WSS4J 2.2.5 in case they have fixed some problems related to the assertion data. Can you try it out, before I make a release?

@FrodeBjerkholt
Copy link
Contributor

Strange that you always get a new bus. When I am testing, the same bus is reused.

@jonassss
Copy link
Author

jonassss commented Aug 4, 2020

Hi, this issue was fixed internally some time ago.
Looks like i forgot i posted it here.

I don't know how to link issue and PR on github, but i created a pull request here:
#129

@SuperJuell SuperJuell added the Pending PR Fixed, waiting for Pull Request label Feb 2, 2021
@aaron-kumar
Copy link
Member

Pull request added in Release candidate : https://search.maven.org/search?q=g:network.oxalis%20AND%20a:oxalis-as4 . I am closing this ticket now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Pending PR Fixed, waiting for Pull Request
Projects
None yet
Development

No branches or pull requests

4 participants