Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openJcePlusTests_0_FAILED #19785

Closed
pshipton opened this issue Jun 28, 2024 · 18 comments
Closed

openJcePlusTests_0_FAILED #19785

pshipton opened this issue Jun 28, 2024 · 18 comments

Comments

@pshipton
Copy link
Member

Besides failing the test, there is too much output for TAP and it causes the job to fail rather than be unstable. I don't know what to search for in the output to find the actual failure.

https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk17_j9_extended.functional_ppc64le_linux_testList_2/620

11:03:26               [test]     [junit] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.037 sec
11:03:26               [test] 
11:03:26               [test] BUILD FAILED
11:03:26               [test] /home/jenkins/workspace/Test_openjdk17_j9_extended.functional_ppc64le_linux_testList_2/jvmtest/functional/OpenJcePlusTests/test.xml:75: Test ibm.jceplus.junit.TestMultithreadFIPS failed
11:03:26               [test] 
11:03:26               [test] Total time: 7 minutes 37 seconds
11:03:26          
11:03:26          BUILD FAILED
11:03:26          /home/jenkins/workspace/Test_openjdk17_j9_extended.functional_ppc64le_linux_testList_2/jvmtest/functional/OpenJcePlusTests/test.xml:35: Java returned: 1
11:03:26          
11:03:26          Total time: 7 minutes 39 seconds
11:03:26          -----------------------------------
11:03:26          openJcePlusTests_0_FAILED
11:03:26      duration_ms: 460929
11:03:26  ]: The incoming YAML document exceeds the limit: 3145728 code points.

jceplustests1.txt

@dmitripivkine
Copy link
Contributor

There are a few tests with Fail. No exception thrown. message. Not sure this is reported error case.

@pshipton
Copy link
Member Author

Created adoptium/aqa-tests#5414 to deliver adoptium/aqa-tests#5409 to the v1.0.2-release aqa-tests branch, which might be the issue.

There are a few tests with Fail. No exception thrown. message. Not sure this is reported error case.

I think as per #19596 they aren't the cause.

@pshipton
Copy link
Member Author

pshipton commented Jul 3, 2024

Also of interest is the tested builds were not 0.46 builds, but tested using the 0.46 test material.

@jasonkatonica do you see any concerns with the failures? I've started new testing on 0.46 builds after delivering adoptium/aqa-tests#5414 so we can see if there are still failures.

@jasonkatonica
Copy link
Contributor

At the moment the .46 test material should work with the latest openjceplus code updates and vice versa. However it would be good for us to use .46 test material with the .46 builds.

I was just checking into the test code at https://github.com/adoptium/aqa-tests/blob/dd8774091c090d89917b27653206a668c804a082/functional/OpenJcePlusTests/build.xml#L40

Do we know if this branch name and repository is set correctly during release building / testing? It seems as if this would resolve to "semeru-java${params.JDK_VERSION}" at https://github.com/adoptium/aqa-tests/blob/dd8774091c090d89917b27653206a668c804a082/buildenv/jenkins/JenkinsfileBase#L111 unless we are overridding that value.

We have also been running into the above parsing issue which results in the error The incoming YAML document exceeds the limit: 3145728 code points. being printed. When this occurs we dont get any of the output from stderr to actually see what test fails.

@JasonFengJ9 Did mention that this is known tooling issue described at https://groups.google.com/g/zaproxy-users/c/-x9nGLdkU5M

@jasonkatonica
Copy link
Contributor

I also can confirm that the fix I put into adoptium/aqa-tests#5414 is indeed the cause of the tests stopping all the time on this test :

09:47:33       [test]     [junit] executing testSHA512
09:47:33       [test]     [junit] executing testSHA512
09:47:33       [test]     [junit] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time 

@pshipton
Copy link
Member Author

pshipton commented Jul 3, 2024

When this occurs we dont get any of the output from stderr to actually see what test fails.

I think the console output is there in the console log. I extracted the output and attached a file to #19785 (comment)

@pshipton
Copy link
Member Author

pshipton commented Jul 3, 2024

Looking at the OE builds, it seems to be working, except for 22 which is still at the 22.0.1 level.

Cloning OpenJCEPlus version semeru-java-11.0.24 from https://github.com/ibmruntimes/OpenJCEPlus.git
Cloning OpenJCEPlus version semeru-java-17.0.12 from https://github.com/ibmruntimes/OpenJCEPlus.git
Cloning OpenJCEPlus version semeru-java-21.0.4 from https://github.com/ibmruntimes/OpenJCEPlus.git
Cloning OpenJCEPlus version semeru-java-22.0.1 from https://github.com/ibmruntimes/OpenJCEPlus.git

@pshipton
Copy link
Member Author

pshipton commented Jul 4, 2024

@jasonkatonica
Copy link
Contributor

The testcase testRSAPSSInterop2 is a known problem that @taoliult is currently looking into further and when it does fail can result in the above:

17:08:27               [test]     [junit] executing testRSAPSSInterop2
17:08:27               [test]     [junit] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2,267.954 sec

This test occasionally hangs and we are trying to explain why and fix this next as it is our next most common test failure. I dont believe that this test failure is associated with the above failures that were failing in testSHA512, this test was failing due to dependency issues.

There are a few test environment problems that we are trying to fix to make this easier in the future.

  1. The current tests exit on first test failure. We instead want all the tests to run to completion and and display the results.
  2. We are installing mvn on various test machines. We are continually updating build and test dependencies ( which caused the issue with testSHA512 and more test issues in the past ) so to make this more reliable we want to call the tests via a mvn command which will maintain dependencies the same way the open source project does. We will also be working with the test team for best approach to cleanup and maintaining dependencies on machines via mvn.
  3. We want to print all output to the logs. The failure stack traces in the log obviously makes it easier to know why something failed. Currently we dont have failure output as far as i know unless the jenkins successfully parses the tap results. ( And currently there is a known bug where TAP does not parse the test results at times resulting in document exceeds the limit ). We can also work toward less test output in the tests. This might help with the parsing issues with the trade off of having less debug output.

@jasonkatonica
Copy link
Contributor

The intermittently failing test testRSAPSSInterop2 was removed from the OpenJCEPlus project for the time being using PR 127 to allow for a bit more stability such that more test runs complete.

@pshipton
Copy link
Member Author

@KostasTsiounis can someone take a look at the jdk22 failures pls.

@KostasTsiounis
Copy link
Contributor

@KostasTsiounis can someone take a look at the jdk22 failures pls.

We have seen this failure in our tests too and have opened an issue about it. The SHA512 multithreaded test fails, but only in FIPS mode and in OpenJDK22.

We will further investigate it, but, after talking to @jasonkatonica about it last week, we don't think it's a stop ship issue.

@pshipton
Copy link
Member Author

Moved it to the 0.48 milestone.

@jasonkatonica
Copy link
Contributor

There is a series of multithreading known problems in the tests themselves that are being worked on for the next release.

@KostasTsiounis Has run the SHA512 related tests numerous times and the failures are not observable on any releases in service at the moment when executed so we believe that there was a unknown issue with older builds that we do not plan on revisiting unless the issue shows up again on current release streams.

As for parsing errors there is a known issue @JasonFengJ9 sent to me discussed at https://groups.google.com/g/zaproxy-users/c/-x9nGLdkU5M for tooling that we are using that causes the parsing error incoming YAML document exceeds the limit: 3145728 code points. seen above. We will work toward limiting high output tests in helping this over time.

Moving this to next release as we dont feel like any of these issues should hold up the release.

@jasonkatonica
Copy link
Contributor

We believe that the SHA512 multithreaded test should now be reliable and this issue could now be closed.

Additionally the error The incoming YAML document exceeds the limit: 3145728 code points. does not typically occur in a test run. If the logs get flooded with messages its possible that this may still occur however this wont be easy to predict so we dont plan on making additional changes for this issue with the tooling at this time.

Copy link

github-actions bot commented Dec 3, 2024

Issue Number: 19785
Status: Closed
Actual Components: test failure, comp:crypto
Actual Assignees: No one :(
PR Assignees: No one :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants