Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JDK-17.0.13+11] GC Performance Defect on JDK17.0.13+11 #20687

Open
BurryPotter opened this issue Nov 27, 2024 · 6 comments
Open

[JDK-17.0.13+11] GC Performance Defect on JDK17.0.13+11 #20687

BurryPotter opened this issue Nov 27, 2024 · 6 comments

Comments

@BurryPotter
Copy link

Java -version output

> ./jdk-11.0.25+9/bin/java -version
openjdk version "11.0.25" 2024-10-15
IBM Semeru Runtime Open Edition 11.0.25.0 (build 11.0.25+9)
Eclipse OpenJ9 VM 11.0.25.0 (build openj9-0.48.0, JRE 11 Linux amd64-64-Bit Compressed References 20241107_1233 (JIT enabled, AOT enabled)
OpenJ9   - 1d5831436e
OMR      - d10a4d553
JCL      - edded3f65c based on jdk-11.0.25+9)
> ./jdk-17.0.13+11/bin/java -version
openjdk version "17.0.13" 2024-10-15
IBM Semeru Runtime Open Edition 17.0.13.11 (build 17.0.13+11)
Eclipse OpenJ9 VM 17.0.13.11 (build openj9-0.48.0, JRE 17 Linux amd64-64-Bit Compressed References 20241015_886 (JIT enabled, AOT enabled)
OpenJ9   - 1d5831436e
OMR      - d10a4d553
JCL      - d17dd58f8d7 based on jdk-17.0.13+11)
> ./jdk-21.0.5+11/bin/java -version
openjdk version "21.0.5" 2024-10-15 LTS
IBM Semeru Runtime Open Edition 21.0.5.11 (build 21.0.5+11-LTS)
Eclipse OpenJ9 VM 21.0.5.11 (build openj9-0.48.0, JRE 21 Linux amd64-64-Bit Compressed References 20241015_307 (JIT enabled, AOT enabled)
OpenJ9   - 1d5831436e
OMR      - d10a4d553
JCL      - b1b311c53fe based on jdk-21.0.5+11)

Summary of problem

We discovered a performance defect in OpenJ9. We executed the program multiple times using the latest versions of JDK11, JDK17, and JDK21 respectively. Their average execution times were 10 seconds, 50 seconds, and 10 seconds respectively, which indicates that there might be performance-related defects in JDK17. Moreover, we think this is a very complex defect because the reproduction program is quite complicated and might be related to GC.

We tried to reduce the program, but it remained rather complex. However, most of the program logic is in the main program, which might help with locating the problem. The execution program is uploaded here reproduce_bug.tar.gz

Next, I will introduce the reproduction process in detail. Since this is a performance-related defect, I will first introduce the environment configuration. We reproduced this defect on two Linux servers respectively. They have 32-core CPUs and 40-core CPUs respectively, and both have 90GB of memory (the reproduction program doesn't actually need such a large amount of memory). And the defect was successfully reproduced in the Ubuntu 18.04 containers within Docker, using the JDK versions mentioned before.

When executing the program, it is necessary to specify the option -Xgcpolicy:optavgpause -Xgcthreads30 -Xconcurrentlevel30 -Xms512m -Xmx4096m. After multiple attempts, we found that there is a probability of reproducing this defect when using all collectors. The reproduction probability of optavgpause/optthruput is the highest, while that of gencon is the lowest. Besides, it is necessary to specify a relatively high number of GC threads. In this example, we specified 30 GC threads. This parameter needs to be adjusted according to the number of CPU cores of the server, requiring the number of threads to be close to the number of CPU cores. (According to the description in the OpenJ9 documentation, the default value of gcthreads is n - 1, where n is the number of CPU cores and the maximum is 64. That is to say, for servers with less than 64 cores, directly using the default value may trigger this defect. This also makes this defect more potentially dangerous.)

We executed the program on JDK11, JDK17, and JDK21 respectively and used the time command to record the execution duration of the program. The following are the specific execution commands and results():

JDK11: 7.12s

\time ./jdk-11.0.25+9_openj9/bin/java -Xgcpolicy:optavgpause -Xgcthreads30 -Xconcurrentlevel30 -Xms512m -Xmx4096m   org.eclipse.core.runtime.adaptor.EclipseStarter -debug 

14.87user 1.19system 0:07.12elapsed 225%CPU (0avgtext+0avgdata 250196maxresident)k

JDK17: 61.1s

\time ./jdk-17.0.13+11/bin/java  -Xgcpolicy:optavgpause -Xgcthreads30 -Xconcurrentlevel30 -Xms512m -Xmx4096m    org.eclipse.core.runtime.adaptor.EclipseStarter -debug

67.02user 1.46system 1:01.10elapsed 112%CPU (0avgtext+0avgdata 273172maxresident)k
0inputs+7912outputs (599major+21074minor)pagefaults 0swaps

JDK21: 7.04s

\time ./jdk-21.0.5+11/bin/java -Xgcpolicy:optavgpause -Xgcthreads30 -Xconcurrentlevel30 -Xms512m -Xmx4096m   org.eclipse.core.runtime.adaptor.EclipseStarter -debug
13.09user 1.82system 0:07.04elapsed 212%CPU (0avgtext+0avgdata 354088maxresident)k
0inputs+5360outputs (669major+13012minor)pagefaults 0swaps

From the above results, it can be found that the execution time of JDK17 is much longer than that of JDK11 and JDK21.

And when we reduced the number of GC threads, the execution time of JDK17 returned to normal and was consistent with that of JDK11 and JDK21:
After reduce the number of thread: 6.8s

\time./jdk-17.0.13+11/bin/java  -Xgcpolicy:optavgpause -Xgcthreads10 -Xconcurrentlevel10 -Xms512m -Xmx4096m  org.eclipse.core.runtime.adaptor.EclipseStarter -debug

10.79user 0.50system 0:06.80elapsed 166%CPU (0avgtext+0avgdata 230848maxresident)k
0inputs+5832outputs (728major+10786minor)pagefaults 0swaps

From the above results, it can be seen that after reducing the GC threads, the execution time of the program decreased from 1 minute to 7 seconds, which indicates that this performance defect is related to GC to some extent.

In addition, when the collector was switched to gencon, the triggering probability of the defect was greatly reduced. During five tests, the problem of overly long execution time only occurred twice. While when using optavgpause/optthruput, this defect was triggered almost every time.

We hope that the information we provided can help with locating the problem. If possible, we would appreciate it if you could help explain the reasons for this performance issue. We are also quite curious about this phenomenon.

Copy link

Issue Number: 20687
Status: Open
Recommended Components: comp:gc, comp:vm, comp:jit

@dmitripivkine
Copy link
Contributor

I am able to reproduce the problem (the same?) just by running test multiple times on random 4 CPU machine with random Java21 with GC policy optthruput:
output:

>>>>>  starting test iteration 10 (Wed Nov 27 09:23:43 PST 2024)
Debug options:
    file:/team/dmitri/20687/eclipse-dacapo/.options loaded
Time to load bundles: 21004
9.45user 0.33system 0:06.29elapsed 155%CPU (0avgtext+0avgdata 165324maxresident)k <---- short run
0inputs+512outputs (0major+8129minor)pagefaults 0swaps
<<<<<  completed test iteration 10 (Wed Nov 27 09:23:49 PST 2024)
>>>>>  starting test iteration 11 (Wed Nov 27 09:23:49 PST 2024) <--- started at 09:23:49
Debug options:
    file:/team/dmitri/20687/eclipse-dacapo/.options loaded
Time to load bundles: 21004
58.86user 0.42system 0:55.54elapsed 106%CPU (0avgtext+0avgdata 201452maxresident)k <---- long run
0inputs+520outputs (0major+9055minor)pagefaults 0swaps
<<<<<  completed test iteration 11 (Wed Nov 27 09:24:45 PST 2024) <--- ended 09:24:45
>>>>>  starting test iteration 12 (Wed Nov 27 09:24:45 PST 2024)
Debug options:
    file:/team/dmitri/20687/eclipse-dacapo/.options loaded
Time to load bundles: 21004
9.52user 0.33system 0:06.37elapsed 154%CPU (0avgtext+0avgdata 166464maxresident)k <---- short run
0inputs+512outputs (0major+8815minor)pagefaults 0swaps
<<<<<  completed test iteration 12 (Wed Nov 27 09:24:51 PST 2024)

I have GC verbose logs collected:

verbosegc.20241127.092343.153664.txt
verbosegc.20241127.092349.153693.txt <--- started at 09:23:49
verbosegc.20241127.092445.153724.txt

Looking to the first/last timestamps mentioned in GC verbose logs:
for verbosegc.20241127.092343.153664.txt

<exclusive-start id="2" timestamp="2024-11-27T09:23:43.796" intervalms="181.175">
<exclusive-end id="1149" timestamp="2024-11-27T09:23:49.215" durationms="5.929" />

for verbosegc.20241127.092349.153693.txt <--- long run

<exclusive-start id="2" timestamp="2024-11-27T09:23:50.120" intervalms="197.165">
<exclusive-end id="1149" timestamp="2024-11-27T09:23:55.406" durationms="5.564" />

for verbosegc.20241127.092445.153724.txt

<exclusive-start id="2" timestamp="2024-11-27T09:24:45.679" intervalms="193.523">
<exclusive-end id="1149" timestamp="2024-11-27T09:24:51.170" durationms="6.145" />

Please note that long run started 09:23:49 ended 09:24:45 has recorded times in the log started 09:23:50 ended 09:23:55.
It means JVM spent most of the time in Shutdown.

Used script:

#!/bin/sh

command=$1
x=0
number=0
echo "Beginning loop"

while [ "$number" -eq 0 ];
do
	echo -e "\033[1;34m>>>>>  starting test iteration $x ("`date`")\033[0m"
	$command
	number=$?
	echo -e "\033[1;34m<<<<<  completed test iteration $x ("`date`")\033[0m"
	x=`expr $x + 1`
done

echo -e "\033[0;36mTest completed at iteration $x with exit value $number\033[0m"

Command line

../../scripts/run_forever.sh "time ../../jvm/jdk-21/bin/java -Xverbosegclog  -Xgcpolicy:optthruput -Xgcthreads4 -Xconcurrentlevel4 -Xms512m -Xmx4096m   org.eclipse.core.runtime.adaptor.EclipseStarter -debug"

-Xconcurrentlevel4 option should be irrelevant with -Xgcpolicy:optthruput

@BurryPotter Would you please try to reproduce the problem with Java 21, just to be sure we both looking to the same problem?

@dmitripivkine
Copy link
Contributor

@BurryPotter It is possible test execution is slow down in JVM Shutdown hook. Would you please check if test subscribes to this hook does not delay JVM Shutdown itself?

@dmitripivkine
Copy link
Contributor

-Xconcurrentlevel4 option should be irrelevant with -Xgcpolicy:optthruput

Yes, it is. I am able to reproduce the problem without this option (and it should be because -Xgcpolicy:optthruput is a pure Stop-The World policy, there is no any concurrency.

@BurryPotter
Copy link
Author

It seems that you have the same defect as I do. However, in my case, this problem can only be reproduced with JDK17. Besides, when using the 'gencon' GC which has concurrent collection functionality, this problem may also be reproduced. The following are the corresponding output and the verbosegclog file verbosegc.tar.gz.

-e >>>>>  starting test iteration 10 (Thu Nov 28 12:33:12 UTC 2024)
...
74.40user 1.15system 1:08.90elapsed 109%CPU (0avgtext+0avgdata 275376maxresident)k
0inputs+11960outputs (1417major+20789minor)pagefaults 0swaps
-e <<<<<  completed test iteration 10 (Thu Nov 28 12:34:21 UTC 2024)
-e >>>>>  starting test iteration 11 (Thu Nov 28 12:34:21 UTC 2024)
...
15.28user 1.28system 0:08.29elapsed 199%CPU (0avgtext+0avgdata 238248maxresident)k
0inputs+9040outputs (1049major+14979minor)pagefaults 0swaps
-e <<<<<  completed test iteration 11 (Thu Nov 28 12:34:29 UTC 2024)

@dmitripivkine
Copy link
Contributor

I was able to get more precise data: the delay occur after end of last System GC (hooks subscribed to gc-end event are included to GC, so after them). We can see in snap traces:

17:45:40.043502637           *0x174700       j9mm.125  Exit       <Trc_MM_SynchronizeGCThreadsAndReleaseMain Exit
17:45:40.043518937                *0x0       j9mm.137  Event       Compact end: bytesmoved=0
17:45:40.043581967                 0x0       j9mm.90   Event       GlobalGC collect complete
17:45:40.043619817                 0x0       j9mm.475  Event       GlobalGC end: workstackoverflow=0 overflowcount=0 memory=535243952/536870912
17:45:40.043641646                 0x0       j9mm.51   Event       SystemGC end: newspace=0/0 oldspace=535243952/536870912 loa=26844160/26844160
>>>>>> delay <<<<<<<
17:46:33.182463928            *0x1ac00      j9jit.44   Event       + (warm) java/util/concurrent/ConcurrentHashMap.casTabAt([Ljava/util/concurrent/ConcurrentHashMap$Node;ILjava/util/concurrent/ConcurrentHashMap$Node;Ljava/util/concurrent/ConcurrentHashMap$Node;)Z @ 0x7f1255202648-0x7f12552026a6/0x0-0x7f12552026a6 time=806us 0x82620 0x7f12bc024c78 RR=- Q_SZ=4 bcsz=21 mem=16384KB
17:46:33.182519327             0x1ac00      j9jit.2    Event        (warm) Compiling java/lang/AbstractStringBuilder.putStringAt(ILjava/lang/String;)V
17:46:33.186209112             0x1ac00      j9jit.44   Event       + (warm) java/lang/AbstractStringBuilder.putStringAt(ILjava/lang/String;)V @ 0x7f1255202708-0x7f1255202ad6/0x0-0x7f1255202ad6 time=3727us 0xbebd8 0x7f12bc024db8 RR=- Q_SZ=4 bcsz=19 mem=16384KB
17:46:33.186255060             0x1ac00      j9jit.2    Event        (warm) Compiling java/lang/AbstractStringBuilder.inflateIfNeededFor(Ljava/lang/String;)V

It is still be not clear where time is taken. There is synchronous Finalization processing after the end of Global GC, this is a next suspect to check. Also there is not clear what test is doing at this point (after end of the last System GC).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants