-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace DOM2DTM with DOM2DTMExt the patched version of DOM2DTM (fixes 1749) #1792
Conversation
@jvshahid thank you so much for this fix. I can confirm it fixes the memory issues with nested XPATH searching. However, there is still a pretty large and possibly exponential performance penalty for nested XPATH searches. Here is the test script I am using:
Here is the output with 5000 divs:
And here is the output with 50,000 divs:
So in both cases the nested version is several orders of magnitude slower than the unnested version. And it seems to get exponentially worse for larger numbers of matched elements. This should definitely be merged in to fix the memory issue, as crashing the JVM is much worse than being slow. However there is still something wrong with nokogiri java's handling of nested XPATH queries. Any idea why the performance might be so bad? Thanks again! |
I think this is a result of the
I will probably open a new issue to track the exponential behavior of |
@jvshahid thanks for the explanation and workaround. Unfortunately for us the badly performant code is in a third-party library that uses nokogiri, although we could fork it and modify. Please let me know when you open the new issue and I'll subscribe there too. Thanks again! |
Just a quick update. I looked at libxml source code and it looks like internally it is using an array similar to the current implementation in JRuby. I am still not sure why the JRuby implementation is that slow. |
created #1795 to track the performance of XMLNodeSet. This PR is ready to merge AFAIK. |
Going to rebase this on master to get tests to pass. |
why this old issue has merge and have the same issue #1873 |
Looks like we replaced DOM2DTM with DOM2DTMExt when we fixed
#1320 but forgot to replace it in
the DOM2DTM manager
fixes #1749