-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Travis build fails in s390x architecture #635
Comments
Hi @Rohanjos97 Thanks for reaching out. Is the failure occurring on s390x consistently? or is it intermittent? Apart from the 'communication error' message, I don't see anything else to indicate the reason for the failure. Please print out the start and stop logs to see if they contain any additional information. Thank you /opt/ol/wlp/bin/server start || { ec=$?; echo "==> start failed" >&2; ls -la /logs >&2; cat /logs/messages.log >&2; cat /logs/console.log >&2; cat /logs/start.log >&2; exit $ec; }
...
/opt/ol/wlp/bin/server stop || { ec=$?; echo "==> stop failed" >&2; ls -la /logs >&2; cat /logs/messages.log >&2; cat /logs/console.log >&2; cat /logs/stop.log >&2; exit $ec; } |
Hi @leochr, Yes, this failure is consistent in all our s390x travis builds... In some of my previous failed builds, I was seeing this error at
There was
Here's the output of server stop command:
Let me know if anything else is required. |
@Rohanjos97 Thank you for providing the logs. @ymanton Error occurs during SCC generation (
|
No, this is a different kind of error. The JVM is crashing while executing JIT compiled code because of a floating point exception. FYI @r30shah. |
Thank you for the confirmation Younes. The floating point exception occurs intermittently, but server stop failure is consistent, at least on s390x. I think we should focus on the server stop failure during SCC generation. @Rohanjos97 Are the latest logs from starting and stopping the server within populate_scc.sh? Noticed some feature definition missing error (CWWKF0042E), so wanted to clarify. Thank you |
I don't see any output from the JVM in any of the logs after That suggests that it was killed forcibly without giving the JVM a chance to handle the error. Typically that happens when the system is low on memory. Is it possible to check the kernel log after |
Regarding to failures in the JIT compiled code posted in #635 (comment), I see that JVM tries to produce the system dump at the time of the failure, but could not (May be due core pattern is not set inside container / some other limitations on the container), if this is consistently reproducible, would it be possible to get the core-dump ? I am not sure if this is reproducible using one of the template servers, will try couple to see, but in order to diagnosis the segmentation fault, would need core-dump to see why it failed with floating point error. |
Hi @leochr, sorry for the late response, yes the logs are from within populate_scc.sh file as I modified the code from inside my container using this |
@ymanton @r30shah Thank you for taking a look and suggesting the diagnostic materials required for further investigation. @Rohanjos97 Could you please gather them? Thank you |
Hi all, output of
I'm running these commands with root user and tried to install sudo, but still seeing the same error for s390x achitecture... output of
I'm trying to print the output of @r30shah Please refer the attached file for the dumpfile that was in the location: Please let me know if I missed something/ more information is required. Thanks! |
Try
|
Hi @ymanton, I tried both commands,
And as for this command: Anything else you have in mind that I could try? Thanks! |
Just so we are on the same page, you are checking the journal, dmesg, The content of your |
I am checking it inside the container... Since it is the container which is failing, I assumed that all the logs would be inside the container... Do I have to run these commands in my travis build after the container fails? |
If the kernel is terminating the JVM because of lack of memory then the logs will not be in the container, they will be on the host. Try the previous suggestions on the Travis build node, not in the container. |
Hi @Rohanjos97 Regarding to #635 (comment), I am looking for the system dump which I think original failure could not reproduce,
There may be something you need to do for core_pattern / or may be some system limits is causing it (I am not sure). The one you shared is Javacore, using which I can not know, why we hit the floating point error. |
Here's the output of Also, since we are talking about travis, I think I should mention that the travis is not
@r30shah can you please have a look at these logs and let me if these are helpful? It seems I was looking at the wrong place earlier when I posted that dump, and let me know if you need anything else. Thanks! |
Ok, that's a pretty important detail. If you're using QEMU or some other emulator you should check its logs for errors. |
Hi Guys, I had shared the output of Meanwhile I'll try to fetch the QEMU logs as well |
Currently, our Dockerfile looks like this
And the build passes for amd64 archtecture. But when the same code executes for s390x architecture builds, we are seeing some error that is originating from
populate_scc.sh
file, specifically travis build fails afterserver stop
command with this output:Tried to debug it using
sed
command by printing theconsole.log
file after theserver start
andserver stop
command:This was the
console.log
output afterserver start
command:I noticed that the [ERROR] message was being printed on the amd64 side as well (not the probable cause of failure for our build)...
The output of
console.log
afterserver stop
command:We then checked if the server is stopping successfully or is there's an issue during stop, so we printed
server stop || sever status
and the output was:
Even though
server stop
command returned non zero code, the server stopped anyway according to the console log above. So, for a workaround for now, we are skipping the error for s390x build by adding|| true
afterserver stop
command, which looks something like this:I was hoping to get some more insights on why the build is failing and if we can add similar logic to the code for s390x builds, or better, if we can figure out what is causing the issue to fix at the root source.
The text was updated successfully, but these errors were encountered: