-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JDK-17.0.13+11] GC Performance Defect on JDK17.0.13+11 #20687
Comments
Issue Number: 20687 |
I am able to reproduce the problem (the same?) just by running test multiple times on random 4 CPU machine with random Java21 with GC policy
I have GC verbose logs collected:
Looking to the first/last timestamps mentioned in GC verbose logs:
for
for
Please note that long run started 09:23:49 ended 09:24:45 has recorded times in the log started 09:23:50 ended 09:23:55. Used script:
Command line
@BurryPotter Would you please try to reproduce the problem with Java 21, just to be sure we both looking to the same problem? |
@BurryPotter It is possible test execution is slow down in JVM Shutdown hook. Would you please check if test subscribes to this hook does not delay JVM Shutdown itself? |
Yes, it is. I am able to reproduce the problem without this option (and it should be because |
It seems that you have the same defect as I do. However, in my case, this problem can only be reproduced with JDK17. Besides, when using the 'gencon' GC which has concurrent collection functionality, this problem may also be reproduced. The following are the corresponding output and the verbosegclog file verbosegc.tar.gz.
|
I was able to get more precise data: the delay occur after end of last System GC (hooks subscribed to gc-end event are included to GC, so after them). We can see in snap traces:
It is still be not clear where time is taken. There is synchronous Finalization processing after the end of Global GC, this is a next suspect to check. Also there is not clear what test is doing at this point (after end of the last System GC). |
Java -version output
Summary of problem
We discovered a performance defect in OpenJ9. We executed the program multiple times using the latest versions of JDK11, JDK17, and JDK21 respectively. Their average execution times were 10 seconds, 50 seconds, and 10 seconds respectively, which indicates that there might be performance-related defects in JDK17. Moreover, we think this is a very complex defect because the reproduction program is quite complicated and might be related to GC.
We tried to reduce the program, but it remained rather complex. However, most of the program logic is in the main program, which might help with locating the problem. The execution program is uploaded here reproduce_bug.tar.gz
Next, I will introduce the reproduction process in detail. Since this is a performance-related defect, I will first introduce the environment configuration. We reproduced this defect on two Linux servers respectively. They have 32-core CPUs and 40-core CPUs respectively, and both have 90GB of memory (the reproduction program doesn't actually need such a large amount of memory). And the defect was successfully reproduced in the Ubuntu 18.04 containers within Docker, using the JDK versions mentioned before.
When executing the program, it is necessary to specify the option
-Xgcpolicy:optavgpause -Xgcthreads30 -Xconcurrentlevel30 -Xms512m -Xmx4096m
. After multiple attempts, we found that there is a probability of reproducing this defect when using all collectors. The reproduction probability of optavgpause/optthruput is the highest, while that of gencon is the lowest. Besides, it is necessary to specify a relatively high number of GC threads. In this example, we specified 30 GC threads. This parameter needs to be adjusted according to the number of CPU cores of the server, requiring the number of threads to be close to the number of CPU cores. (According to the description in the OpenJ9 documentation, the default value of gcthreads is n - 1, where n is the number of CPU cores and the maximum is 64. That is to say, for servers with less than 64 cores, directly using the default value may trigger this defect. This also makes this defect more potentially dangerous.)We executed the program on JDK11, JDK17, and JDK21 respectively and used the time command to record the execution duration of the program. The following are the specific execution commands and results():
JDK11: 7.12s
JDK17: 61.1s
JDK21: 7.04s
From the above results, it can be found that the execution time of JDK17 is much longer than that of JDK11 and JDK21.
And when we reduced the number of GC threads, the execution time of JDK17 returned to normal and was consistent with that of JDK11 and JDK21:
After reduce the number of thread: 6.8s
From the above results, it can be seen that after reducing the GC threads, the execution time of the program decreased from 1 minute to 7 seconds, which indicates that this performance defect is related to GC to some extent.
In addition, when the collector was switched to
gencon
, the triggering probability of the defect was greatly reduced. During five tests, the problem of overly long execution time only occurred twice. While when usingoptavgpause/optthruput
, this defect was triggered almost every time.We hope that the information we provided can help with locating the problem. If possible, we would appreciate it if you could help explain the reasons for this performance issue. We are also quite curious about this phenomenon.
The text was updated successfully, but these errors were encountered: