Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow for large vintage test suites #3050

Closed
ben-manes opened this issue Oct 2, 2022 · 6 comments · Fixed by #3073
Closed

Extremely slow for large vintage test suites #3050

ben-manes opened this issue Oct 2, 2022 · 6 comments · Fixed by #3073

Comments

@ben-manes
Copy link

Steps to reproduce

This Gradle project (junit-example.zip) demonstrates this performance regression using Guava's testlib to generate many test cases against HashMap. In total this runs 982,500 tests and that can be adjusted in the MapTests iterator size.

$ gradle junit4Test

BUILD SUCCESSFUL in 2m 2s
2 actionable tasks: 2 executed

$ gradle junit5Test

BUILD SUCCESSFUL in 7m 49s
2 actionable tasks: 1 executed, 1 up-to-date

Context

I am trying to migrate Caffeine onto JUnitPlatform for running the suites. The external suites are JUnit 3 or 4, and the internal is TestNG. In total this amounts to ~10M tests due to brute force testing of different cache configurations.

The caffeine:junitTest suite runs 354,295 tests, which typically takes ~3 minutes. This fails after 17 minutes with an out-of-memory error. The TestNG scanner is also broken, but in a more severe manner. It tries to instantiate non-test classes unless those are explicitly excluded. It then appears to try to to run tests not in the specified group, perhaps just to ignore them, and exhausts memory without making any progress.

For the smaller modules the integration works wonderfully and simplifies the build files. It is only for these larger test suites does it seem to struggle. That might be related to #2460 but it seems worse than simply being inefficient with memory.

diff --git a/caffeine/testing.gradle b/caffeine/testing.gradle
index 30072599..985094a4 100644
--- a/caffeine/testing.gradle
+++ b/caffeine/testing.gradle
@@ -67,10 +67,13 @@ tasks.register('junitTest', Test) {
   group = 'Cache tests'
   description = 'JUnit tests'
 
-  useJUnit()
   dependsOn(jar)
   tasks.test.dependsOn(it)
 
+  useJUnitPlatform {
+    includeEngines 'junit-vintage'
+  }
+
   failFast = true
   maxHeapSize = '2g'
   systemProperty 'caffeine.osgi.jar', project(':caffeine').jar.archivePath.path
@ben-manes
Copy link
Author

Profiling shows that there are some significant performance issues that might be easy to address. This would significantly reduce the cpu time and memory usage.

  1. UniqueIdFormat.parse is very expensive and the top culprit. It could be much faster and the UniqueId lazily parsed.
  2. Method.getParameterTypes() should be cached to avoid being copied (due to being an array return type)
  3. MethodSorter deeply copies the method and string sorts. It should cache the methods to reuse toString, etc.
  4. JUnit38ClassRunner.getAnnotations(tc) throws and ignores exceptions, causing stack fills for wasted cpu time

Attached is the JFR recording which can be opened by Java Mission Control, junit.jfr.zip. This provides some helpful suggestions and gives a good view of the execution.

I did the following which you can repeat for future investigations,

  1. Applied the patch and ran ./gradlew caffeine:junitTest
  2. Used jps to find the GradleWorkerMain pid
  3. Verified the correct worker by using jstack <pid> | grep vintage
  4. Recorded over a time duration, which is persisted once and stops recording when the elapsed.
    jcmd <pid> JFR.start duration=12m filename=<dir>/junit.jfr settings=profile.jfc
  5. Opened the profile in JMC and reviewed the allocations, hotspots, gc, etc.

@ben-manes
Copy link
Author

ben-manes commented Oct 22, 2022

5.10.0-SNAPSHOT increases the execution time by a full two minutes in the sample project, from 7m 24s to 9m 27s. I had hoped to offer some patches to discuss possible optimizations and make this usable, but I'm a bit dejected by this large degradation and the lack of concern towards performance. If this is the trend then I am afraid a few spot fixes won't be enough for me to feel comfortable using this framework except in small test scenarios.

@marcphilipp
Copy link
Member

Please don't interpret the lack of response on this issue so far as a lack of interest. I think it's fair to say that your use case is a bit outside the norm. But I'd be happy to work with you on this. We should also put some regression tests in place to avoid such degradations in the future.

@marcphilipp marcphilipp added this to the 5.10 M1 milestone Oct 22, 2022
@ben-manes
Copy link
Author

ben-manes commented Oct 22, 2022

Thanks @marcphilipp. I did a basic optimization pass of UniqueIdFormat and that reduced a local snapshot to 5m 31s. I don't mean to imply this is better code, just ideas for consideration. This removes regex, decoding, allocations, wasted space, and map lookups.

diff --git a/junit-platform-engine/src/main/java/org/junit/platform/engine/UniqueIdFormat.java b/junit-platform-engine/src/main/java/org/junit/platform/engine/UniqueIdFormat.java
index cf931f79af..9f3b2dd509 100644
--- a/junit-platform-engine/src/main/java/org/junit/platform/engine/UniqueIdFormat.java
+++ b/junit-platform-engine/src/main/java/org/junit/platform/engine/UniqueIdFormat.java
@@ -10,22 +10,16 @@
 
 package org.junit.platform.engine;
 
-import static java.util.stream.Collectors.joining;
-import static java.util.stream.Collectors.toList;
-
 import java.io.Serializable;
 import java.io.UnsupportedEncodingException;
 import java.net.URLDecoder;
 import java.net.URLEncoder;
 import java.nio.charset.StandardCharsets;
 import java.util.Arrays;
-import java.util.HashMap;
-import java.util.List;
-import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
 import org.junit.platform.commons.JUnitException;
-import org.junit.platform.commons.util.Preconditions;
+import org.junit.platform.commons.PreconditionViolationException;
 import org.junit.platform.engine.UniqueId.Segment;
 
 /**
@@ -61,27 +55,21 @@ class UniqueIdFormat implements Serializable {
 	private final char closeSegment;
 	private final char segmentDelimiter;
 	private final char typeValueSeparator;
-	private final Pattern segmentPattern;
-	private final HashMap<Character, String> encodedCharacterMap = new HashMap<>();
+	private final char[] characters;
+	private final String[] encodedCharacters;
 
 	UniqueIdFormat(char openSegment, char typeValueSeparator, char closeSegment, char segmentDelimiter) {
 		this.openSegment = openSegment;
 		this.typeValueSeparator = typeValueSeparator;
 		this.closeSegment = closeSegment;
 		this.segmentDelimiter = segmentDelimiter;
-		this.segmentPattern = Pattern.compile(
-			String.format("%s(.+)%s(.+)%s", quote(openSegment), quote(typeValueSeparator), quote(closeSegment)),
-			Pattern.DOTALL);
 
 		// Compute "forbidden" character encoding map.
-		// Note that the map is always empty at this point. Thus the use of
-		// computeIfAbsent() is purely syntactic sugar.
-		encodedCharacterMap.computeIfAbsent('%', UniqueIdFormat::encode);
-		encodedCharacterMap.computeIfAbsent('+', UniqueIdFormat::encode);
-		encodedCharacterMap.computeIfAbsent(openSegment, UniqueIdFormat::encode);
-		encodedCharacterMap.computeIfAbsent(typeValueSeparator, UniqueIdFormat::encode);
-		encodedCharacterMap.computeIfAbsent(closeSegment, UniqueIdFormat::encode);
-		encodedCharacterMap.computeIfAbsent(segmentDelimiter, UniqueIdFormat::encode);
+		characters = new char[] { '%', '+', openSegment, typeValueSeparator, closeSegment, segmentDelimiter };
+		encodedCharacters = new String[characters.length];
+		for (int i = 0; i < characters.length; i++) {
+			encodedCharacters[i] = UniqueIdFormat.encode(characters[i]);
+		}
 	}
 
 	/**
@@ -91,71 +79,117 @@ class UniqueIdFormat implements Serializable {
 	 * @throws JUnitException if the string cannot be parsed
 	 */
 	UniqueId parse(String source) throws JUnitException {
-		String[] parts = source.split(String.valueOf(this.segmentDelimiter));
-		List<Segment> segments = Arrays.stream(parts).map(this::createSegment).collect(toList());
-		return new UniqueId(this, segments);
+		int length = 1;
+		for (int i = 0; i < source.length(); i++) {
+			if (source.charAt(i) == segmentDelimiter) {
+				length++;
+			}
+		}
+
+		int index = 0;
+		int start = 0;
+		Segment[] segments = new Segment[length];
+		do {
+			int end = source.indexOf(segmentDelimiter, start);
+			if (end == -1) {
+				end = source.length();
+			}
+			if (start != end) {
+				segments[index] = createSegment(source.substring(start, end));
+				index++;
+			}
+			start = end + 1;
+		} while (start < source.length());
+		return new UniqueId(this, Arrays.asList(segments));
 	}
 
 	private Segment createSegment(String segmentString) throws JUnitException {
-		Matcher segmentMatcher = this.segmentPattern.matcher(segmentString);
-		if (!segmentMatcher.matches()) {
+		if ((segmentString.charAt(0) != openSegment)
+				&& (segmentString.charAt(segmentString.length() - 1) != closeSegment)) {
 			throw new JUnitException(String.format("'%s' is not a well-formed UniqueId segment", segmentString));
 		}
-		String type = decode(checkAllowed(segmentMatcher.group(1)));
-		String value = decode(checkAllowed(segmentMatcher.group(2)));
+		int splitAt = segmentString.indexOf(typeValueSeparator);
+		String type = decode(checkAllowed(segmentString.substring(1, splitAt)));
+		String value = decode(checkAllowed(segmentString.substring(splitAt + 1, segmentString.length() - 1)));
 		return new Segment(type, value);
 	}
 
 	private String checkAllowed(String typeOrValue) {
-		checkDoesNotContain(typeOrValue, this.segmentDelimiter);
-		checkDoesNotContain(typeOrValue, this.typeValueSeparator);
-		checkDoesNotContain(typeOrValue, this.openSegment);
-		checkDoesNotContain(typeOrValue, this.closeSegment);
+		for (int i = 0; i < typeOrValue.length(); i++) {
+			char c = typeOrValue.charAt(i);
+			checkDoesNotContain(typeOrValue, c, this.segmentDelimiter);
+			checkDoesNotContain(typeOrValue, c, this.typeValueSeparator);
+			checkDoesNotContain(typeOrValue, c, this.openSegment);
+			checkDoesNotContain(typeOrValue, c, this.closeSegment);
+		}
 		return typeOrValue;
 	}
 
-	private void checkDoesNotContain(String typeOrValue, char forbiddenCharacter) {
-		Preconditions.condition(typeOrValue.indexOf(forbiddenCharacter) < 0,
-			() -> String.format("type or value '%s' must not contain '%s'", typeOrValue, forbiddenCharacter));
+	private void checkDoesNotContain(String typeOrValue, char c, char forbiddenCharacter) {
+		if (c == forbiddenCharacter) {
+			throw new PreconditionViolationException(
+				String.format("type or value '%s' must not contain '%s'", typeOrValue, forbiddenCharacter));
+		}
 	}
 
 	/**
 	 * Format and return the string representation of the supplied {@code UniqueId}.
 	 */
 	String format(UniqueId uniqueId) {
-		// @formatter:off
-		return uniqueId.getSegments().stream()
-			.map(this::describe)
-			.collect(joining(String.valueOf(this.segmentDelimiter)));
-		// @formatter:on
+		int estimatedLength = 0;
+		for (Segment segment : uniqueId.getSegments()) {
+			estimatedLength += segment.getType().length() + 4 + segment.getValue().length();
+		}
+		StringBuilder builder = new StringBuilder(estimatedLength + /* slack */ 16);
+		for (int i = 0; i < uniqueId.getSegments().size(); i++) {
+			describe(builder, uniqueId.getSegments().get(i));
+			if (i != (uniqueId.getSegments().size() - 1)) {
+				builder.append(this.segmentDelimiter);
+			}
+		}
+		return builder.toString();
 	}
 
-	private String describe(Segment segment) {
-		String body = encode(segment.getType()) + typeValueSeparator + encode(segment.getValue());
-		return openSegment + body + closeSegment;
+	private void describe(StringBuilder builder, Segment segment) {
+		builder.append(openSegment);
+		encode(builder, segment.getType());
+		builder.append(typeValueSeparator);
+		encode(builder, segment.getValue());
+		builder.append(closeSegment);
 	}
 
-	private String encode(String s) {
-		StringBuilder builder = new StringBuilder(s.length());
+	private void encode(StringBuilder builder, String s) {
 		for (int i = 0; i < s.length(); i++) {
+			int index = -1;
 			char c = s.charAt(i);
-			String value = encodedCharacterMap.get(c);
-			if (value == null) {
+			for (int j = 0; j < characters.length; j++) {
+				if (c == characters[j]) {
+					index = j;
+					break;
+				}
+			}
+
+			if (index == -1) {
 				builder.append(c);
-				continue;
 			}
-			builder.append(value);
+			else {
+				builder.append(encodedCharacters[index]);
+			}
 		}
-		return builder.toString();
 	}
 
 	private static String decode(String s) {
-		try {
-			return URLDecoder.decode(s, StandardCharsets.UTF_8.name());
-		}
-		catch (UnsupportedEncodingException e) {
-			throw new JUnitException("UTF-8 should be supported", e);
+		for (int i = 0; i < s.length(); i++) {
+			char c = s.charAt(i);
+			if ((c == '+') || (c == '%')) {
+				try {
+					return URLDecoder.decode(s, StandardCharsets.UTF_8.name());
+				}
+				catch (UnsupportedEncodingException e) {
+					throw new JUnitException("UTF-8 should be supported", e);
+				}
+			}
 		}
+		return s;
 	}
-
 }

I next tried using an lru cache when creating the segments as immutable, common prefixes (engine, class), and interning could reduce memory pressure. In a cold run it was a bit faster, though a profiled run somehow was a minute faster so I'm surprised (4m 28s). It may just be the profile slowed things down to let the GC not pause so much, as I don't think the cache was helpful enough to be used. The attached jfr snapshot shows that 1m7s is spent on old gc which is excessive, and the heap shows a steady trend towards exhaustion (2gb). It's easier to dig into the allocation hotspots by opening this up in JProfiler, which shows that the Method.toString() in MethodSorter and ReflectionUtils is the next prime opportunity. That seems to account for 66GiB of allocations.

@marcphilipp marcphilipp linked a pull request Oct 23, 2022 that will close this issue
7 tasks
@marcphilipp
Copy link
Member

marcphilipp commented Oct 23, 2022

Thanks for the patch! While we can still consider doing these optimizations, I found it curious that UniqueIdFormat.parse is even called here. Ironically, the regression in TestPlan that introduced all this parsing necessary was made in #2460 on your behalf. 😉

I've prepared a small PR (#3073) that avoids that from JUnit's own listeners (e.g. ExecutionListenerAdapter). However, Gradle's listener implementation also makes use of TestPlan.getTestIdentifier(String) and would have to be changed as well (which I could take care of once this optimization is released).

I ran an experiment with using JUnit's ConsoleLauncher instead of Gradle's Test task with promising results.

Before

Test run finished after 53365 ms
[    282503 containers found      ]
[         0 containers skipped    ]
[    282503 containers started    ]
[         0 containers aborted    ]
[    282503 containers successful ]
[         0 containers failed     ]
[    982500 tests found           ]
[         0 tests skipped         ]
[    982500 tests started         ]
[         0 tests aborted         ]
[    982500 tests successful      ]
[         0 tests failed          ]


BUILD SUCCESSFUL in 1m 39s

After

Test run finished after 24105 ms
[    282503 containers found      ]
[         0 containers skipped    ]
[    282503 containers started    ]
[         0 containers aborted    ]
[    282503 containers successful ]
[         0 containers failed     ]
[    982500 tests found           ]
[         0 tests skipped         ]
[    982500 tests started         ]
[         0 tests aborted         ]
[    982500 tests successful      ]
[         0 tests failed          ]


BUILD SUCCESSFUL in 1m 12s

Please note that the "Test run finished" timings do not include the discovery phase.

@ben-manes
Copy link
Author

Thanks for the fix. I'll wait until the changes are in Gradle to retest. While I think this will help, I suspect that we'll have to make further changes here to get my build to run on JUnitPlatform. I haven't looked into the TestNG behavior but that appeared to not be correctly making test selections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants