Extremely slow for large vintage test suites #3050

ben-manes · 2022-10-02T08:17:11Z

Steps to reproduce

This Gradle project (junit-example.zip) demonstrates this performance regression using Guava's testlib to generate many test cases against HashMap. In total this runs 982,500 tests and that can be adjusted in the MapTests iterator size.

$ gradle junit4Test

BUILD SUCCESSFUL in 2m 2s
2 actionable tasks: 2 executed

$ gradle junit5Test

BUILD SUCCESSFUL in 7m 49s
2 actionable tasks: 1 executed, 1 up-to-date

Context

I am trying to migrate Caffeine onto JUnitPlatform for running the suites. The external suites are JUnit 3 or 4, and the internal is TestNG. In total this amounts to ~10M tests due to brute force testing of different cache configurations.

The caffeine:junitTest suite runs 354,295 tests, which typically takes ~3 minutes. This fails after 17 minutes with an out-of-memory error. The TestNG scanner is also broken, but in a more severe manner. It tries to instantiate non-test classes unless those are explicitly excluded. It then appears to try to to run tests not in the specified group, perhaps just to ignore them, and exhausts memory without making any progress.

For the smaller modules the integration works wonderfully and simplifies the build files. It is only for these larger test suites does it seem to struggle. That might be related to #2460 but it seems worse than simply being inefficient with memory.

diff --git a/caffeine/testing.gradle b/caffeine/testing.gradle
index 30072599..985094a4 100644
--- a/caffeine/testing.gradle
+++ b/caffeine/testing.gradle
@@ -67,10 +67,13 @@ tasks.register('junitTest', Test) {
   group = 'Cache tests'
   description = 'JUnit tests'
 
-  useJUnit()
   dependsOn(jar)
   tasks.test.dependsOn(it)
 
+  useJUnitPlatform {
+    includeEngines 'junit-vintage'
+  }
+
   failFast = true
   maxHeapSize = '2g'
   systemProperty 'caffeine.osgi.jar', project(':caffeine').jar.archivePath.path

The text was updated successfully, but these errors were encountered:

ben-manes · 2022-10-03T03:01:30Z

Profiling shows that there are some significant performance issues that might be easy to address. This would significantly reduce the cpu time and memory usage.

UniqueIdFormat.parse is very expensive and the top culprit. It could be much faster and the UniqueId lazily parsed.
Method.getParameterTypes() should be cached to avoid being copied (due to being an array return type)
MethodSorter deeply copies the method and string sorts. It should cache the methods to reuse toString, etc.
JUnit38ClassRunner.getAnnotations(tc) throws and ignores exceptions, causing stack fills for wasted cpu time

Attached is the JFR recording which can be opened by Java Mission Control, junit.jfr.zip. This provides some helpful suggestions and gives a good view of the execution.

I did the following which you can repeat for future investigations,

Applied the patch and ran ./gradlew caffeine:junitTest
Used jps to find the GradleWorkerMain pid
Verified the correct worker by using jstack <pid> | grep vintage
Recorded over a time duration, which is persisted once and stops recording when the elapsed.
jcmd <pid> JFR.start duration=12m filename=<dir>/junit.jfr settings=profile.jfc
Opened the profile in JMC and reviewed the allocations, hotspots, gc, etc.

ben-manes · 2022-10-22T05:41:09Z

5.10.0-SNAPSHOT increases the execution time by a full two minutes in the sample project, from 7m 24s to 9m 27s. I had hoped to offer some patches to discuss possible optimizations and make this usable, but I'm a bit dejected by this large degradation and the lack of concern towards performance. If this is the trend then I am afraid a few spot fixes won't be enough for me to feel comfortable using this framework except in small test scenarios.

marcphilipp · 2022-10-22T06:22:02Z

Please don't interpret the lack of response on this issue so far as a lack of interest. I think it's fair to say that your use case is a bit outside the norm. But I'd be happy to work with you on this. We should also put some regression tests in place to avoid such degradations in the future.

ben-manes · 2022-10-22T21:08:10Z

Thanks @marcphilipp. I did a basic optimization pass of UniqueIdFormat and that reduced a local snapshot to 5m 31s. I don't mean to imply this is better code, just ideas for consideration. This removes regex, decoding, allocations, wasted space, and map lookups.

diff --git a/junit-platform-engine/src/main/java/org/junit/platform/engine/UniqueIdFormat.java b/junit-platform-engine/src/main/java/org/junit/platform/engine/UniqueIdFormat.java
index cf931f79af..9f3b2dd509 100644
--- a/junit-platform-engine/src/main/java/org/junit/platform/engine/UniqueIdFormat.java
+++ b/junit-platform-engine/src/main/java/org/junit/platform/engine/UniqueIdFormat.java
@@ -10,22 +10,16 @@
 
 package org.junit.platform.engine;
 
-import static java.util.stream.Collectors.joining;
-import static java.util.stream.Collectors.toList;
-
 import java.io.Serializable;
 import java.io.UnsupportedEncodingException;
 import java.net.URLDecoder;
 import java.net.URLEncoder;
 import java.nio.charset.StandardCharsets;
 import java.util.Arrays;
-import java.util.HashMap;
-import java.util.List;
-import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
 import org.junit.platform.commons.JUnitException;
-import org.junit.platform.commons.util.Preconditions;
+import org.junit.platform.commons.PreconditionViolationException;
 import org.junit.platform.engine.UniqueId.Segment;
 
 /**
@@ -61,27 +55,21 @@ class UniqueIdFormat implements Serializable {
 	private final char closeSegment;
 	private final char segmentDelimiter;
 	private final char typeValueSeparator;
-	private final Pattern segmentPattern;
-	private final HashMap<Character, String> encodedCharacterMap = new HashMap<>();
+	private final char[] characters;
+	private final String[] encodedCharacters;
 
 	UniqueIdFormat(char openSegment, char typeValueSeparator, char closeSegment, char segmentDelimiter) {
 		this.openSegment = openSegment;
 		this.typeValueSeparator = typeValueSeparator;
 		this.closeSegment = closeSegment;
 		this.segmentDelimiter = segmentDelimiter;
-		this.segmentPattern = Pattern.compile(
-			String.format("%s(.+)%s(.+)%s", quote(openSegment), quote(typeValueSeparator), quote(closeSegment)),
-			Pattern.DOTALL);
 
 		// Compute "forbidden" character encoding map.
-		// Note that the map is always empty at this point. Thus the use of
-		// computeIfAbsent() is purely syntactic sugar.
-		encodedCharacterMap.computeIfAbsent('%', UniqueIdFormat::encode);
-		encodedCharacterMap.computeIfAbsent('+', UniqueIdFormat::encode);
-		encodedCharacterMap.computeIfAbsent(openSegment, UniqueIdFormat::encode);
-		encodedCharacterMap.computeIfAbsent(typeValueSeparator, UniqueIdFormat::encode);
-		encodedCharacterMap.computeIfAbsent(closeSegment, UniqueIdFormat::encode);
-		encodedCharacterMap.computeIfAbsent(segmentDelimiter, UniqueIdFormat::encode);
+		characters = new char[] { '%', '+', openSegment, typeValueSeparator, closeSegment, segmentDelimiter };
+		encodedCharacters = new String[characters.length];
+		for (int i = 0; i < characters.length; i++) {
+			encodedCharacters[i] = UniqueIdFormat.encode(characters[i]);
+		}
 	}
 
 	/**
@@ -91,71 +79,117 @@ class UniqueIdFormat implements Serializable {
 	 * @throws JUnitException if the string cannot be parsed
 	 */
 	UniqueId parse(String source) throws JUnitException {
-		String[] parts = source.split(String.valueOf(this.segmentDelimiter));
-		List<Segment> segments = Arrays.stream(parts).map(this::createSegment).collect(toList());
-		return new UniqueId(this, segments);
+		int length = 1;
+		for (int i = 0; i < source.length(); i++) {
+			if (source.charAt(i) == segmentDelimiter) {
+				length++;
+			}
+		}
+
+		int index = 0;
+		int start = 0;
+		Segment[] segments = new Segment[length];
+		do {
+			int end = source.indexOf(segmentDelimiter, start);
+			if (end == -1) {
+				end = source.length();
+			}
+			if (start != end) {
+				segments[index] = createSegment(source.substring(start, end));
+				index++;
+			}
+			start = end + 1;
+		} while (start < source.length());
+		return new UniqueId(this, Arrays.asList(segments));
 	}
 
 	private Segment createSegment(String segmentString) throws JUnitException {
-		Matcher segmentMatcher = this.segmentPattern.matcher(segmentString);
-		if (!segmentMatcher.matches()) {
+		if ((segmentString.charAt(0) != openSegment)
+				&& (segmentString.charAt(segmentString.length() - 1) != closeSegment)) {
 			throw new JUnitException(String.format("'%s' is not a well-formed UniqueId segment", segmentString));
 		}
-		String type = decode(checkAllowed(segmentMatcher.group(1)));
-		String value = decode(checkAllowed(segmentMatcher.group(2)));
+		int splitAt = segmentString.indexOf(typeValueSeparator);
+		String type = decode(checkAllowed(segmentString.substring(1, splitAt)));
+		String value = decode(checkAllowed(segmentString.substring(splitAt + 1, segmentString.length() - 1)));
 		return new Segment(type, value);
 	}
 
 	private String checkAllowed(String typeOrValue) {
-		checkDoesNotContain(typeOrValue, this.segmentDelimiter);
-		checkDoesNotContain(typeOrValue, this.typeValueSeparator);
-		checkDoesNotContain(typeOrValue, this.openSegment);
-		checkDoesNotContain(typeOrValue, this.closeSegment);
+		for (int i = 0; i < typeOrValue.length(); i++) {
+			char c = typeOrValue.charAt(i);
+			checkDoesNotContain(typeOrValue, c, this.segmentDelimiter);
+			checkDoesNotContain(typeOrValue, c, this.typeValueSeparator);
+			checkDoesNotContain(typeOrValue, c, this.openSegment);
+			checkDoesNotContain(typeOrValue, c, this.closeSegment);
+		}
 		return typeOrValue;
 	}
 
-	private void checkDoesNotContain(String typeOrValue, char forbiddenCharacter) {
-		Preconditions.condition(typeOrValue.indexOf(forbiddenCharacter) < 0,
-			() -> String.format("type or value '%s' must not contain '%s'", typeOrValue, forbiddenCharacter));
+	private void checkDoesNotContain(String typeOrValue, char c, char forbiddenCharacter) {
+		if (c == forbiddenCharacter) {
+			throw new PreconditionViolationException(
+				String.format("type or value '%s' must not contain '%s'", typeOrValue, forbiddenCharacter));
+		}
 	}
 
 	/**
 	 * Format and return the string representation of the supplied {@code UniqueId}.
 	 */
 	String format(UniqueId uniqueId) {
-		// @formatter:off
-		return uniqueId.getSegments().stream()
-			.map(this::describe)
-			.collect(joining(String.valueOf(this.segmentDelimiter)));
-		// @formatter:on
+		int estimatedLength = 0;
+		for (Segment segment : uniqueId.getSegments()) {
+			estimatedLength += segment.getType().length() + 4 + segment.getValue().length();
+		}
+		StringBuilder builder = new StringBuilder(estimatedLength + /* slack */ 16);
+		for (int i = 0; i < uniqueId.getSegments().size(); i++) {
+			describe(builder, uniqueId.getSegments().get(i));
+			if (i != (uniqueId.getSegments().size() - 1)) {
+				builder.append(this.segmentDelimiter);
+			}
+		}
+		return builder.toString();
 	}
 
-	private String describe(Segment segment) {
-		String body = encode(segment.getType()) + typeValueSeparator + encode(segment.getValue());
-		return openSegment + body + closeSegment;
+	private void describe(StringBuilder builder, Segment segment) {
+		builder.append(openSegment);
+		encode(builder, segment.getType());
+		builder.append(typeValueSeparator);
+		encode(builder, segment.getValue());
+		builder.append(closeSegment);
 	}
 
-	private String encode(String s) {
-		StringBuilder builder = new StringBuilder(s.length());
+	private void encode(StringBuilder builder, String s) {
 		for (int i = 0; i < s.length(); i++) {
+			int index = -1;
 			char c = s.charAt(i);
-			String value = encodedCharacterMap.get(c);
-			if (value == null) {
+			for (int j = 0; j < characters.length; j++) {
+				if (c == characters[j]) {
+					index = j;
+					break;
+				}
+			}
+
+			if (index == -1) {
 				builder.append(c);
-				continue;
 			}
-			builder.append(value);
+			else {
+				builder.append(encodedCharacters[index]);
+			}
 		}
-		return builder.toString();
 	}
 
 	private static String decode(String s) {
-		try {
-			return URLDecoder.decode(s, StandardCharsets.UTF_8.name());
-		}
-		catch (UnsupportedEncodingException e) {
-			throw new JUnitException("UTF-8 should be supported", e);
+		for (int i = 0; i < s.length(); i++) {
+			char c = s.charAt(i);
+			if ((c == '+') || (c == '%')) {
+				try {
+					return URLDecoder.decode(s, StandardCharsets.UTF_8.name());
+				}
+				catch (UnsupportedEncodingException e) {
+					throw new JUnitException("UTF-8 should be supported", e);
+				}
+			}
 		}
+		return s;
 	}
-
 }

I next tried using an lru cache when creating the segments as immutable, common prefixes (engine, class), and interning could reduce memory pressure. In a cold run it was a bit faster, though a profiled run somehow was a minute faster so I'm surprised (4m 28s). It may just be the profile slowed things down to let the GC not pause so much, as I don't think the cache was helpful enough to be used. The attached jfr snapshot shows that 1m7s is spent on old gc which is excessive, and the heap shows a steady trend towards exhaustion (2gb). It's easier to dig into the allocation hotspots by opening this up in JProfiler, which shows that the Method.toString() in MethodSorter and ReflectionUtils is the next prime opportunity. That seems to account for 66GiB of allocations.

marcphilipp · 2022-10-23T17:31:27Z

Thanks for the patch! While we can still consider doing these optimizations, I found it curious that UniqueIdFormat.parse is even called here. Ironically, the regression in TestPlan that introduced all this parsing necessary was made in #2460 on your behalf. 😉

I've prepared a small PR (#3073) that avoids that from JUnit's own listeners (e.g. ExecutionListenerAdapter). However, Gradle's listener implementation also makes use of TestPlan.getTestIdentifier(String) and would have to be changed as well (which I could take care of once this optimization is released).

I ran an experiment with using JUnit's ConsoleLauncher instead of Gradle's Test task with promising results.

Before

Test run finished after 53365 ms
[    282503 containers found      ]
[         0 containers skipped    ]
[    282503 containers started    ]
[         0 containers aborted    ]
[    282503 containers successful ]
[         0 containers failed     ]
[    982500 tests found           ]
[         0 tests skipped         ]
[    982500 tests started         ]
[         0 tests aborted         ]
[    982500 tests successful      ]
[         0 tests failed          ]


BUILD SUCCESSFUL in 1m 39s

After

Test run finished after 24105 ms
[    282503 containers found      ]
[         0 containers skipped    ]
[    282503 containers started    ]
[         0 containers aborted    ]
[    282503 containers successful ]
[         0 containers failed     ]
[    982500 tests found           ]
[         0 tests skipped         ]
[    982500 tests started         ]
[         0 tests aborted         ]
[    982500 tests successful      ]
[         0 tests failed          ]


BUILD SUCCESSFUL in 1m 12s

Please note that the "Test run finished" timings do not include the discovery phase.

ben-manes · 2022-10-31T01:44:39Z

Thanks for the fix. I'll wait until the changes are in Gradle to retest. While I think this will help, I suspect that we'll have to make further changes here to get my build to run on JUnitPlatform. I haven't looked into the TestNG behavior but that appeared to not be correctly making test selections.

github-actions bot added 3rd-party: Gradle status: new labels Oct 2, 2022

marcphilipp added this to the 5.10 M1 milestone Oct 22, 2022

marcphilipp linked a pull request Oct 23, 2022 that will close this issue

Avoid unnecessarily parsing unique IDs #3073

Merged

7 tasks

marcphilipp modified the milestones: 5.10 M1, 5.9.2 Oct 28, 2022

marcphilipp added theme: execution status: in progress component: Vintage and removed status: new labels Oct 28, 2022

marcphilipp self-assigned this Oct 28, 2022

marcphilipp closed this as completed in #3073 Oct 30, 2022

marcphilipp removed the status: in progress label Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extremely slow for large vintage test suites #3050

Extremely slow for large vintage test suites #3050

ben-manes commented Oct 2, 2022

ben-manes commented Oct 3, 2022

ben-manes commented Oct 22, 2022 •

edited

Loading

marcphilipp commented Oct 22, 2022

ben-manes commented Oct 22, 2022 •

edited

Loading

marcphilipp commented Oct 23, 2022 •

edited

Loading

ben-manes commented Oct 31, 2022

Extremely slow for large vintage test suites #3050

Extremely slow for large vintage test suites #3050

Comments

ben-manes commented Oct 2, 2022

Steps to reproduce

Context

ben-manes commented Oct 3, 2022

ben-manes commented Oct 22, 2022 • edited Loading

marcphilipp commented Oct 22, 2022

ben-manes commented Oct 22, 2022 • edited Loading

marcphilipp commented Oct 23, 2022 • edited Loading

Before

After

ben-manes commented Oct 31, 2022

ben-manes commented Oct 22, 2022 •

edited

Loading

ben-manes commented Oct 22, 2022 •

edited

Loading

marcphilipp commented Oct 23, 2022 •

edited

Loading