Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find-symbol can fail with "Requested array size exceeds VM limit" #245

Open
vemv opened this issue Mar 28, 2019 · 14 comments
Open

find-symbol can fail with "Requested array size exceeds VM limit" #245

vemv opened this issue Mar 28, 2019 · 14 comments
Assignees

Comments

@vemv
Copy link
Member

vemv commented Mar 28, 2019

Hi there,

I've faced this issue repeatedly. In the past I had worked around it via :jvm-opts ["-Xmx18G"], but today I hit it again (long time no see!). I could try setting an even higher bound, but I guess a GH issue will be more useful.

The error is:

  • When I cljr-rename-symbol, through natural usage,
  • I get the attached stacktrace printed to Emacs' *Messages*
Fetching symbol occurrences...
Error in nrepl-refactor: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
 at java.util.Arrays.copyOf (Arrays.java:3332)
    java.lang.AbstractStringBuilder.ensureCapacityInternal (AbstractStringBuilder.java:124)
    java.lang.AbstractStringBuilder.append (AbstractStringBuilder.java:448)
    java.lang.StringBuffer.append (StringBuffer.java:270)
    java.io.StringWriter.write (StringWriter.java:101)
    clojure.core$fn__7283.invokeStatic (core_print.clj:159)
    clojure.core/fn (core_print.clj:158)
    clojure.lang.MultiFn.invoke (MultiFn.java:234)
    clojure.core$pr_on.invokeStatic (core.clj:3674)
    clojure.core$pr_on.invoke (core.clj:3668)
    clojure.core$print_prefix_map$fn__7317.invoke (core_print.clj:233)
    clojure.core$print_sequential.invokeStatic (core_print.clj:66)
    clojure.core$print_prefix_map.invokeStatic (core_print.clj:229)
    clojure.core$print_map.invokeStatic (core_print.clj:238)
    clojure.core$fn__7344.invokeStatic (core_print.clj:266)
    clojure.core/fn (core_print.clj:263)
    clojure.lang.MultiFn.invoke (MultiFn.java:234)
    clojure.core$pr_on.invokeStatic (core.clj:3674)
    clojure.core$pr_on.invoke (core.clj:3668)
    clojure.core$print_prefix_map$fn__7317.invoke (core_print.clj:233)
    clojure.core$print_sequential.invokeStatic (core_print.clj:66)
    clojure.core$print_prefix_map.invokeStatic (core_print.clj:229)
    clojure.core$print_map.invokeStatic (core_print.clj:238)
    clojure.core$fn__7344.invokeStatic (core_print.clj:266)
    clojure.core/fn (core_print.clj:263)
    clojure.lang.MultiFn.invoke (MultiFn.java:234)
    clojure.core$pr_on.invokeStatic (core.clj:3674)
    clojure.core$pr_on.invoke (core.clj:3668)
    clojure.core$print_prefix_map$fn__7317.invoke (core_print.clj:233)
    clojure.core$print_sequential.invokeStatic (core_print.clj:66)
    clojure.core$print_prefix_map.invokeStatic (core_print.clj:229)
    clojure.core$print_map.invokeStatic (core_print.clj:238)

Interestingly, I can reproduce the error via the repl, skipping the middleware.

(refactor-nrepl.find.find-symbol/find-symbol   {:file "/path/to/file/page.clj",
                                                :ns "some.ns"
                                                :name "handler"
                                                :line 43
                                                :column 6
                                                :ignore-errors false})

...and further playing at the repl, I could find the exact expression at fault:

(->> (core/dirs-on-classpath)
(mapcat (partial core/find-in-dir (some-fn core/clj-file? core/cljc-file?)))
(mapcat (partial find-symbol-in-file fully-qualified-name ignore-errors)))))

I can run for example

(->> (core/dirs-on-classpath)
     (mapcat (partial core/find-in-dir (some-fn core/clj-file? core/cljc-file?)))
     (mapcat (partial find-symbol-in-file "clojure.core/+" false))
     count)

And it will reliably OOM.

Let me know if I can help you reproduce the error again. Perhaps you have it easy by setting a deliberately small -Xmx.

Victor

@vemv
Copy link
Member Author

vemv commented Mar 28, 2019

Side note, the pinpointed code seems a good target for parallelization. Perhaps if done with reducers, the change wouldn't be too invasive.

@vemv
Copy link
Member Author

vemv commented Mar 28, 2019

One last note, (->> (core/dirs-on-classpath) (mapcat (partial core/find-in-dir (some-fn core/clj-file? core/cljc-file?))) count) returns 276 for me, each file having a reasonable size. i.e. it's not an humongous project.

@vemv
Copy link
Member Author

vemv commented Mar 28, 2019

I further debugged this.

  • The culrpit boils down to refactor-nrepl.find.find-symbol/find-symbol-in-file
    • Specifically, its (ana/ns-ast file-content) call
  • One specific input that will cause it to OOM is the following:
(defn foo
  [{:keys [x]}]
  (go-loop []
    ))

...Namely:

  • clojure.core.async/go-loop must be there
  • A destructured argument must also be there.

You can apply this patch, and easily reproduce the issue:

  • lein repl
  • (require 'refactor-nrepl.analyzer)
  • (refactor-nrepl.analyzer/ns-ast (slurp "test/resources/testproject/src/com/example/vemv.clj")) -> OOM

@benedekfazekas
Copy link
Member

thanks for the really good analysis. we have several probs with core.async as far as i remember due to its macro heavy nature. will have a closer look, ta

@vemv
Copy link
Member Author

vemv commented Apr 9, 2019

Hey there,

have you had a chance (or foresee one soon enough)?

Btw, this issue report is a bit messy, it boils down to the last comment. I can close + re-create this ticket if that tidies up things.

@slipset
Copy link

slipset commented Dec 7, 2019

I've been hit by this as well.
On our largish codebase (70k lines of Clojure), it seems like the caching of the ast done in
refactor-nrepl.analyzer fails to handle the complete ast of our project (at least on my machine)

When commenting out https://github.com/clojure-emacs/refactor-nrepl/blob/master/src/refactor_nrepl/analyzer.clj#L53 it seems to run without the OOM, which I think makes sense, as I can imagine the total ast for this project would be rather large?

@expez
Copy link
Member

expez commented Dec 9, 2019

@slipset I just had the same error occur on a smaller project (8k lines of clojure).

error in process filter: Error in nrepl-refactor: java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf (Arrays.java:3332)
    java.lang.AbstractStringBuilder.ensureCapacityInternal (AbstractStringBuilder.java:124)
    java.lang.AbstractStringBuilder.append (AbstractStringBuilder.java:448)
    java.lang.StringBuffer.append (StringBuffer.java:270)
    java.io.StringWriter.write (StringWriter.java:101)
    cider.nrepl.print_method$eval31775$fn__31776.invoke (print_method.clj:69)
    clojure.lang.MultiFn.invoke (MultiFn.java:233)
    clojure.core$pr_on.invokeStatic (core.clj:3666)
    clojure.core$pr_on.invoke (core.clj:3660)
    clojure.core$print_prefix_map$fn__7041.invoke (core_print.clj:233)
    clojure.core$print_sequential.invokeStatic (core_print.clj:61)
    clojure.core$print_prefix_map.invokeStatic (core_print.clj:229)
    clojure.core$print_map.invokeStatic (core_print.clj:238)
    clojure.core$fn__7068.invokeStatic (core_print.clj:266)
    clojure.core/fn (core_print.clj:263)
    clojure.lang.MultiFn.invoke (MultiFn.java:233)
    clojure.core$pr_on.invokeStatic (core.clj:3666)
    clojure.core$pr_on.invoke (core.clj:3660)
    clojure.core$print_prefix_map$fn__7041.invoke (core_print.clj:233)
    clojure.core$print_sequential.invokeStatic (core_print.clj:61)
    clojure.core$print_prefix_map.invokeStatic (core_print.clj:229)
    clojure.core$print_map.invokeStatic (core_print.clj:238)
    clojure.core$fn__7068.invokeStatic (core_print.clj:266)
    clojure.core/fn (core_print.clj:263)
    clojure.lang.MultiFn.invoke (MultiFn.java:233)
    clojure.core$pr_on.invokeStatic (core.clj:3666)
    clojure.core$pr_on.invoke (core.clj:3660)
    clojure.core$print_prefix_map$fn__7041.invoke (core_print.clj:233)
    clojure.core$print_sequential.invokeStatic (core_print.clj:61)
    clojure.core$print_prefix_map.invokeStatic (core_print.clj:229)
    clojure.core$print_map.invokeStatic (core_print.clj:238)
    clojure.core$fn__7068.invokeStatic (core_print.clj:266)

This is the same stacktrace as in the OP, but the error is slightly different.

Looking at the stacktrace print_prefix_map occurs 6 times. Perhaps there's a problem with cycles here?

@mbuczko
Copy link
Contributor

mbuczko commented Dec 17, 2019

@slipset @vemv could you try to reproduce the problem against newest master?

@vemv
Copy link
Member Author

vemv commented Dec 18, 2019

Hi, thanks for attending the issue!

Yes I could still reproduce it.

Updated repro instuctions are as follow:

  • git checkout fresh https://github.com/clojure-emacs/refactor-nrepl master
  • curl https://gist.githubusercontent.com/vemv/a01b63102c3fb6b19d37f376e78546fd/raw/74d064cac449f6063f4751f6fa9f3e8e8fe53899/gistfile1.txt | git apply
  • (require 'refactor-nrepl.analyzer)
  • (refactor-nrepl.analyzer/ns-ast (slurp "testproject/src/com/example/vemv.clj")) -> OOM

My env:

  • Personal .lein profile stuff was left nil
  • High-end MBP

@mbuczko
Copy link
Contributor

mbuczko commented Dec 23, 2019

I think this is still related to the problem I tried to solve in #278. (refactor-nrepl.analyzer/ns-ast (slurp "test/resources/testproject/src/com/example/vemv.clj")) at some point tries to stringify the result to display it in REPL I guess, and we painfully end up with OOM.

Let's try the other way:

(def ast 
   (refactor-nrepl.analyzer/ns-ast (slurp "test/resources/testproject/src/com/example/vemv.clj")))

if we now try to stringify the result, like this:

(str ast)

again - we get OOM. This is more or less what I fixed in #278.

Lets instead redirect output directly to file:

(with-open [w (clojure.java.io/writer "/tmp/ast.txt")]
  (binding [*out* w] (pr ast)))

Now, after few minutes (and assuming you had enough free space on your drive) you can see where is the problem . Generated file is gigantic. I interrupted entire dump after reaching 37GB.

That's insane :)

@expez
Copy link
Member

expez commented Jun 28, 2021

@vemv Any hope of narrowing this done? I'm not sure how to tackle this one, but it's probably one of the most important outstanding issues in refactor-nrepl atm.

Since eastwood manages to use tools.analyzer on this code I'm guessing we're doing something we shouldn't somewhere, much like the bug @mbuczko closed in #278.

@vemv
Copy link
Member Author

vemv commented Jun 28, 2021

As a thing that maybe can help, this wall of flags helps me avoid all sort of OOMs (most specifically, for stack-heavy workloads):

[;; Prevents a specific type of OOMs:
 "-XX:CompressedClassSpaceSize=3G"
 ;; Prevents trivial StackOverflow errors:
 "-XX:MaxJavaStackTraceDepth=1000000"
 ;; Set a generous limit as the maximum that can be allocated, preventing certain types of OOMs:
 "-Xmx24G"
 ;; increase stack size x6, for preventing SO errors:
 ;;   (The current default can be found with
 ;;    `java -XX:+PrintFlagsFinal -version 2>/dev/null | grep "intx ThreadStackSize"`)
 "-Xss6144k"]

Also @slipset's comment seems on point. Keeping an AST cache seems a delicate business: outdated data can grow without control, correct? So something like a core.cache SoftCache might be more cautious.

Or keeping things simpler, one could have a per-run cache: things would be cached for a specific refactor-nrepl op, but not across ops. I realise one will have more cache misses this way, but one gains simplicity and a reduced chance of OOMs.

Finally, one nice thing I came up for Eastwood recently is jonase/eastwood@aa569a5 . It might not be related directly to our issue here, but omitting core.async exceptions can be good in other contexts / for other issues.

@expez
Copy link
Member

expez commented Jun 28, 2021

outdated data can grow without control, correct?

Looks like we have a bit of a bug in the cache, in the sense that it's append-only and old shit is never thrown out. On the other hand it won't grow on it's own. Unless you're using the refactoring a lot and/or have a very large change set I doubt this is it. I'll add another issue tracking that (should be very easy to fix), but I think this one here is something different.

edit: I just did a quick read-through of the cache code and it looks OK. It won't grow out of bounds. We also know that a single file can produce an AST that's too large to handle. Guess the next step might be to see what sort of recursion happens when such an AST is created...

@vemv
Copy link
Member Author

vemv commented Jun 28, 2021

Yes the #245 (comment) repro hints so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants