Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Travis build fails in s390x architecture #635

Open
Rohanjos97 opened this issue Sep 4, 2024 · 20 comments
Open

Travis build fails in s390x architecture #635

Rohanjos97 opened this issue Sep 4, 2024 · 20 comments

Comments

@Rohanjos97
Copy link

Currently, our Dockerfile looks like this

FROM icr.io/appcafe/websphere-liberty:24.0.0.6-kernel-java11-openj9-ubi

...

COPY --chown=1001:0 server.xml /opt/ibm/wlp/usr/servers/defaultServer/server.xml
RUN configure.sh

...

And the build passes for amd64 archtecture. But when the same code executes for s390x architecture builds, we are seeing some error that is originating from populate_scc.sh file, specifically travis build fails after server stop command with this output:

310.5 Stopping server defaultServer.
314.2 The command stop failed because of a communication error with the server.
314.2 Server defaultServer stop failed. Check server logs for details.

Tried to debug it using sed command by printing the console.log file after the server start and server stop command:

RUN sed -i "s|\/opt\/ibm\/wlp\/bin\/server start|\/opt\/ibm\/wlp\/bin\/server start \&\& ls \/logs \&\& \-al \&\& cat \/logs\/console.log|g" /opt/ibm/helpers/build/populate_scc.sh
RUN sed -i "s|\/opt\/ibm\/wlp\/bin\/server stop|\/opt\/ibm\/wlp\/bin\/server stop \&\& ls \-al \/logs \&\& cat \/logs\/console.log|g" /opt/ibm/helpers/build/populate_scc.sh

This was the console.log output after server start command:

#9 304.7 Launching defaultServer (WebSphere Application Server 24.0.0.6/wlp-1.0.90.cl240620240603-2001) on Eclipse OpenJ9 VM, version 11.0.24+8 (en_US)
#9 304.7 [AUDIT   ] CWWKE0001I: The server defaultServer has been launched.
#9 304.7 [AUDIT   ] CWWKE0100I: This product is licensed for development, and limited production use. The full license terms can be viewed here: https://public.dhe.ibm.com/ibmdl/export/pub/software/websphere/wasdev/license/base_ilan/ilan/24.0.0.6/lafiles/en.html
#9 304.7 [AUDIT   ] CWWKG0093A: Processing configuration drop-ins resource: /opt/ibm/wlp/usr/servers/defaultServer/configDropins/defaults/keystore.xml
#9 304.7 [AUDIT   ] CWWKZ0058I: Monitoring dropins for applications.
#9 304.7 [ERROR   ] CWSJY0035E: Unable to load classes from the WebSphere MQ resource adapter, because the wmqJmsClient.rar.location variable has not been set. Set this variable to point to the location of the MQ resource adapter
#9 304.7 [ERROR   ] CWWKE0701E: bundle com.ibm.ws.jca:1.0.90.cl240620240603-2001 (98)[com.ibm.ws.jca.internal.ResourceAdapterService(219)] : The activate method has thrown an exception java.lang.IllegalArgumentException: CWSJY0035E: Unable to load classes from the WebSphere MQ resource adapter, because the wmqJmsClient.rar.location variable has not been set. Set this variable to point to the location of the MQ resource adapter
#9 304.7 	at com.ibm.ws.wmq.WMQService.setClassLoaderID(WMQService.java:205)
#9 304.7 	at [internal classes]
#9 304.7   
#9 304.7 [AUDIT   ] CWWKS4104A: LTPA keys created in 7.658 seconds. LTPA key file: /opt/ibm/wlp/output/defaultServer/resources/security/ltpa.keys
#9 304.7 [AUDIT   ] CWPKI0803A: SSL certificate created in 23.795 seconds. SSL key file: /opt/ibm/wlp/output/defaultServer/resources/security/key.p12
#9 304.7 [AUDIT   ] CWWKI0001I: The CORBA name server is now available at corbaloc:iiop:localhost:2809/NameService.
#9 304.7 [ERROR   ] CWSJY0035E: Unable to load classes from the WebSphere MQ resource adapter, because the wmqJmsClient.rar.location variable has not been set. Set this variable to point to the location of the MQ resource adapter
#9 304.7 [ERROR   ] CWWKE0701E: bundle com.ibm.ws.jca:1.0.90.cl240620240603-2001 (98)[com.ibm.ws.jca.internal.ResourceAdapterService(219)] : The activate method has thrown an exception java.lang.IllegalArgumentException: CWSJY0035E: Unable to load classes from the WebSphere MQ resource adapter, because the wmqJmsClient.rar.location variable has not been set. Set this variable to point to the location of the MQ resource adapter
#9 304.7 	at com.ibm.ws.wmq.WMQService.setClassLoaderID(WMQService.java:205)
#9 304.7 	at [internal classes]
#9 304.7   
#9 304.7 [AUDIT   ] CWWKT0016I: Web application available (default_host): http://localhost:9080/jwt/
#9 304.7 [AUDIT   ] CWWKT0016I: Web application available (default_host): http://localhost:9080/oidcclient/
#9 304.7 [AUDIT   ] CWWKT0016I: Web application available (default_host): http://localhost:9080/oauth2/
#9 304.7 [AUDIT   ] CWWKF0012I: The server installed the following features: [appSecurity-2.0, appSecurity-3.0, beanValidation-2.0, cdi-2.0, distributedMap-1.0, ejbHome-3.2, ejbLite-3.2, ejbRemote-3.2, el-3.0, federatedRegistry-1.0, jaspic-1.1, javaMail-1.6, jaxb-2.2, jaxrs-2.1, jaxrsClient-2.1, jaxws-2.2, jca-1.7, jdbc-4.2, jmsMdb-3.2, jndi-1.0, jpa-2.2, jpaContainer-2.2, jsf-2.3, json-1.0, jsonb-1.0, jsonp-1.1, jsp-2.3, jwt-1.0, jwtSso-1.0, ldapRegistry-3.0, managedBeans-1.0, mdb-3.2, monitor-1.0, oauth-2.0, openidConnectClient-1.0, servlet-4.0, ssl-1.0, transportSecurity-1.0, wasJmsClient-2.0, wasJmsServer-1.0, webProfile-8.0, websocket-1.1, wmqJmsClient-2.0].
#9 304.7 [AUDIT   ] CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 95.428 seconds.

I noticed that the [ERROR] message was being printed on the amd64 side as well (not the probable cause of failure for our build)...

The output of console.log after server stop command:

#9 304.6 + cat /logs/console.log
#9 304.7 Launching defaultServer (WebSphere Application Server 24.0.0.6/wlp-1.0.90.cl240620240603-2001) on Eclipse OpenJ9 VM, version 11.0.24+8 (en_US)
#9 304.7 [AUDIT   ] CWWKE0001I: The server defaultServer has been launched.
#9 304.7 [AUDIT   ] CWWKE0100I: This product is licensed for development, and limited production use. The full license terms can be viewed here: https://public.dhe.ibm.com/ibmdl/export/pub/software/websphere/wasdev/license/base_ilan/ilan/24.0.0.6/lafiles/en.html
#9 304.7 [AUDIT   ] CWWKG0093A: Processing configuration drop-ins resource: /opt/ibm/wlp/usr/servers/defaultServer/configDropins/defaults/keystore.xml
#9 304.7 [AUDIT   ] CWWKZ0058I: Monitoring dropins for applications.
#9 304.7 [ERROR   ] CWSJY0035E: Unable to load classes from the WebSphere MQ resource adapter, because the wmqJmsClient.rar.location variable has not been set. Set this variable to point to the location of the MQ resource adapter
#9 304.7 [ERROR   ] CWWKE0701E: bundle com.ibm.ws.jca:1.0.90.cl240620240603-2001 (98)[com.ibm.ws.jca.internal.ResourceAdapterService(219)] : The activate method has thrown an exception java.lang.IllegalArgumentException: CWSJY0035E: Unable to load classes from the WebSphere MQ resource adapter, because the wmqJmsClient.rar.location variable has not been set. Set this variable to point to the location of the MQ resource adapter
#9 304.7 	at com.ibm.ws.wmq.WMQService.setClassLoaderID(WMQService.java:205)
#9 304.7 	at [internal classes]
#9 304.7   
#9 304.7 [AUDIT   ] CWWKS4104A: LTPA keys created in 7.658 seconds. LTPA key file: /opt/ibm/wlp/output/defaultServer/resources/security/ltpa.keys
#9 304.7 [AUDIT   ] CWPKI0803A: SSL certificate created in 23.795 seconds. SSL key file: /opt/ibm/wlp/output/defaultServer/resources/security/key.p12
#9 304.7 [AUDIT   ] CWWKI0001I: The CORBA name server is now available at corbaloc:iiop:localhost:2809/NameService.
#9 304.7 [ERROR   ] CWSJY0035E: Unable to load classes from the WebSphere MQ resource adapter, because the wmqJmsClient.rar.location variable has not been set. Set this variable to point to the location of the MQ resource adapter
#9 304.7 [ERROR   ] CWWKE0701E: bundle com.ibm.ws.jca:1.0.90.cl240620240603-2001 (98)[com.ibm.ws.jca.internal.ResourceAdapterService(219)] : The activate method has thrown an exception java.lang.IllegalArgumentException: CWSJY0035E: Unable to load classes from the WebSphere MQ resource adapter, because the wmqJmsClient.rar.location variable has not been set. Set this variable to point to the location of the MQ resource adapter
#9 304.7 	at com.ibm.ws.wmq.WMQService.setClassLoaderID(WMQService.java:205)
#9 304.7 	at [internal classes]
#9 304.7   
#9 304.7 [AUDIT   ] CWWKT0016I: Web application available (default_host): http://localhost:9080/jwt/
#9 304.7 [AUDIT   ] CWWKT0016I: Web application available (default_host): http://localhost:9080/oidcclient/
#9 304.7 [AUDIT   ] CWWKT0016I: Web application available (default_host): http://localhost:9080/oauth2/
#9 304.7 [AUDIT   ] CWWKF0012I: The server installed the following features: [appSecurity-2.0, appSecurity-3.0, beanValidation-2.0, cdi-2.0, distributedMap-1.0, ejbHome-3.2, ejbLite-3.2, ejbRemote-3.2, el-3.0, federatedRegistry-1.0, jaspic-1.1, javaMail-1.6, jaxb-2.2, jaxrs-2.1, jaxrsClient-2.1, jaxws-2.2, jca-1.7, jdbc-4.2, jmsMdb-3.2, jndi-1.0, jpa-2.2, jpaContainer-2.2, jsf-2.3, json-1.0, jsonb-1.0, jsonp-1.1, jsp-2.3, jwt-1.0, jwtSso-1.0, ldapRegistry-3.0, managedBeans-1.0, mdb-3.2, monitor-1.0, oauth-2.0, openidConnectClient-1.0, servlet-4.0, ssl-1.0, transportSecurity-1.0, wasJmsClient-2.0, wasJmsServer-1.0, webProfile-8.0, websocket-1.1, wmqJmsClient-2.0].
#9 304.7 [AUDIT   ] CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 95.428 seconds.
#9 304.7 + '[' true == true ']'
#9 304.7 + curl --silent --output /dev/null --show-error --fail --max-time 5 localhost:9080/
#9 307.0 + '[' true == true ']'
#9 307.0 + curl --silent --output /dev/null --show-error --fail --max-time 5 localhost:9080/openapi
#9 307.2 curl: (22) The requested URL returned error: 404 Not Found
#9 307.2 + echo 'localhost:9080/openapi call failed, continuing'
#9 307.2 + /opt/ibm/wlp/bin/server stop
#9 307.2 localhost:9080/openapi call failed, continuing
#9 307.7 
#9 310.5 Stopping server defaultServer.
#9 314.2 The command stop failed because of a communication error with the server.
#9 314.2 Server defaultServer stop failed. Check server logs for details.

We then checked if the server is stopping successfully or is there's an issue during stop, so we printed server stop || sever status
and the output was:

#9 340.8 + /opt/ibm/wlp/bin/server status
1057#9 341.3 
1058#9 344.5 Server defaultServer is not running.

Even though server stop command returned non zero code, the server stopped anyway according to the console log above. So, for a workaround for now, we are skipping the error for s390x build by adding || true after server stop command, which looks something like this:

RUN if [ "$ARCHITECTURE" != "amd64" ] ; then \
        sed -i "s|\/opt\/ibm\/wlp\/bin\/server stop|\/opt\/ibm\/wlp\/bin\/server stop \|\| true|g" /opt/ibm/helpers/build/populate_scc.sh; \
    fi

I was hoping to get some more insights on why the build is failing and if we can add similar logic to the code for s390x builds, or better, if we can figure out what is causing the issue to fix at the root source.

@leochr
Copy link
Member

leochr commented Sep 4, 2024

Hi @Rohanjos97 Thanks for reaching out.

Is the failure occurring on s390x consistently? or is it intermittent?
When it fails, does it always fail on the server stop command (in populate_scc.sh)? or does it also fail on server start?

Apart from the 'communication error' message, I don't see anything else to indicate the reason for the failure.

Please print out the start and stop logs to see if they contain any additional information. Thank you

  /opt/ol/wlp/bin/server start  || { ec=$?; echo "==> start failed" >&2; ls -la /logs >&2; cat /logs/messages.log >&2; cat /logs/console.log >&2; cat /logs/start.log >&2; exit $ec; }

...

  /opt/ol/wlp/bin/server stop  || { ec=$?; echo "==> stop failed" >&2; ls -la /logs >&2; cat /logs/messages.log >&2; cat /logs/console.log >&2; cat /logs/stop.log >&2; exit $ec; }

@Rohanjos97
Copy link
Author

Hi @leochr,

Yes, this failure is consistent in all our s390x travis builds...

In some of my previous failed builds, I was seeing this error at RUN configure.sh command which was intermittent. I couldn't make much out of it, but maybe it is helpful?

#8 [ 4/28] RUN configure.sh
#8 0.337 WARNING: This is not an optimal build configuration. Although features in server.xml will continue to be installed correctly, the 'RUN features.sh' command should be added to the Dockerfile prior to configure.sh. See https://ibm.biz/wl-app-image-template for a sample application image template.
#8 0.394 + main
#8 0.398 + WLP_INSTALL_DIR=/opt/ibm/wlp
#8 0.401 + SHARED_CONFIG_DIR=/opt/ibm/wlp/usr/shared/config
#8 0.402 + SHARED_RESOURCE_DIR=/opt/ibm/wlp/usr/shared/resources
#8 0.402 + SNIPPETS_SOURCE=/opt/ibm/helpers/build/configuration_snippets
#8 0.402 + SNIPPETS_TARGET=/config/configDropins/overrides
#8 0.402 + SNIPPETS_TARGET_DEFAULTS=/config/configDropins/defaults
#8 0.404 + mkdir -p /config/configDropins/overrides
#8 0.470 + mkdir -p /config/configDropins/defaults
#8 0.543 + '[' false == false ']'
#8 0.544 + '[' '' == true ']'
#8 0.545 + '[' false == true ']'
#8 0.545 + '[' '' == true ']'
#8 0.545 + '[' false == true ']'
#8 0.546 + '[' false == true ']'
#8 0.546 + '[' '' == true ']'
#8 0.547 + '[' false == true ']'
#8 0.547 + '[' false == true ']'
#8 0.549 + [[ -n '' ]]
#8 0.550 + '[' false == client ']'
#8 0.551 + '[' false == embedded ']'
#8 0.551 + [[ -n '' ]]
#8 0.551 + '[' '' == true ']'
#8 0.551 + '[' '' == true ']'
#8 0.551 + keystorePath=/config/configDropins/defaults/keystore.xml
#8 0.552 + '[' '' '!=' false ']'
#8 0.552 + '[' '' '!=' false ']'
#8 0.552 + '[' '!' -e /config/configDropins/defaults/keystore.xml ']'
#8 0.562 ++ openssl rand -base64 32
#8 0.643 + export KEYSTOREPWD=sEW5TT/5rW6HMMbCEVGTonGftfCNnGHukGSqomgbguA=
#8 0.645 + KEYSTOREPWD=sEW5TT/5rW6HMMbCEVGTonGftfCNnGHukGSqomgbguA=
#8 0.648 + sed 's|REPLACE|sEW5TT/5rW6HMMbCEVGTonGftfCNnGHukGSqomgbguA=|g' /opt/ibm/helpers/build/configuration_snippets/keystore.xml
#8 0.725 + chmod g+w /config/configDropins/defaults/keystore.xml
#8 0.785 + [[ -n '' ]]
#8 0.785 + '[' '' '!=' true ']'
#8 0.785 + '[' false == false ']'
#8 0.786 + '[' '' ']'
#8 0.786 + installUtility install --acceptLicense defaultServer
#8 6.011 Checking for missing features required by the server ...
#8 6.494 Establishing a connection to the configured repositories ...
#8 6.494 This process might take several minutes to complete.
#8 6.494 
#8 12.49 Successfully connected to all configured repositories.
#8 12.49 
#8 12.50 Preparing assets for installation. This process might take several minutes to complete.
#8 44.66 Unhandled exception
#8 44.66 Type=Floating point error vmState=0x00000000
#8 44.66 J9Generic_Signal_Number=00000088 Signal_Number=00000008 Error_Value=00000000 Signal_Code=00000000
#8 44.66 Handler1=0000004008DE5F50 Handler2=000000400909FA60
#8 44.66 gpr0=0000000000000000 gpr1=000000000D4A85C8 gpr2=000000000DA661A8 gpr3=00000000F4BBBA48
#8 44.66 gpr4=0000000000000000 gpr5=000000000016C1A0 gpr6=0000000000000000 gpr7=0000000000000000
#8 44.66 gpr8=00000001E97775C0 gpr9=000000000DA661A8 gpr10=000000000DA661A8 gpr11=00000001E9777480
#8 44.66 gpr12=00000001E9777480 gpr13=000000000008E200 gpr14=000000401D6951B8 gpr15=0000004000AB5B98
#8 44.66 psw=000000401D695322 mask=0705200180000000 fpc=00000000 bea=0000000000000000
#8 44.67 fpr0=000000400a1c9dc8 (f: 169647552.000000, d: 1.358915e-312)
#8 44.67 fpr1=ff00000000000000 (f: 0.000000, d: -5.486124e+303)
#8 44.67 fpr2=5700000000000000 (f: 0.000000, d: 1.202454e+111)
#8 44.67 fpr3=2020202020202020 (f: 538976256.000000, d: 6.013470e-154)
#8 44.67 fpr4=617ae0f6f8fe0000 (f: 4177395712.000000, d: 3.778914e+161)
#8 44.67 fpr5=a0c0a0c0a0c00000 (f: 2696937472.000000, d: -6.349662e-151)
#8 44.67 fpr6=dfdfb5b5ffff0000 (f: 4294901760.000000, d: -6.643110e+153)
#8 44.67 fpr7=80808080a0a00000 (f: 2694840320.000000, d: -2.937447e-306)
#8 44.67 fpr8=00000000000644c0 (f: 410816.000000, d: 2.029701e-318)
#8 44.67 fpr9=0000000000000000 (f: 0.000000, d: 0.000000e+00)
#8 44.67 fpr10=0000000000064be0 (f: 412640.000000, d: 2.038712e-318)
#8 44.67 fpr11=0000000000000000 (f: 0.000000, d: 0.000000e+00)
#8 44.67 fpr12=0000004000ab7910 (f: 11237648.000000, d: 1.358133e-312)
#8 44.67 fpr13=0000000000000000 (f: 0.000000, d: 0.000000e+00)
#8 44.67 fpr14=0000000000040000 (f: 262144.000000, d: 1.295163e-318)
#8 44.67 fpr15=0000000000000000 (f: 0.000000, d: 0.000000e+00)
#8 44.67 
#8 44.67 Compiled_method=org/glassfish/json/JsonObjectBuilderImpl$JsonObjectImpl.getString(Ljava/lang/String;)Ljava/lang/String;
#8 44.67 Target=2_90_20240802_1032 (Linux 4.19.86)
#8 44.67 CPU=s390x (8 logical CPUs) (0x77c0c1000 RAM)
#8 44.67 ----------- Stack Backtrace -----------
#8 44.67  (0x000000401D695322 [<unknown>+0x0])
#8 44.67 ---------------------------------------
#8 44.68 JVMDUMP039I Processing dump event "gpf", detail "" at 2024/08/29 17:20:27 - please wait.
#8 44.69 JVMDUMP032I JVM requested System dump using '//core.20240829.172027.40.0001.dmp' in response to an event
#8 46.10 JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/lib/systemd/coredump-wrapper %E %P %u %g %s %t %c %h %e" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.78.  Review the manual for the external program to find where the core dump is written and ensure the program does not truncate it.
#8 46.10 
#8 52.02 JVMPORT049I The core file created by child process with pid = 78 was not found. Review the documentation for the /proc/sys/kernel/core_pattern program "|/usr/lib/systemd/coredump-wrapper %E %P %u %g %s %t %c %h %e" to find where the core file is written and ensure that program does not truncate it.
#8 52.02 
#8 52.05 JVMDUMP012E Error in System dump: //core.20240829.172027.40.0001.dmp
#8 52.06 JVMDUMP032I JVM requested Java dump using '//javacore.20240829.172027.40.0002.txt' in response to an event
#8 52.22 /opt/ibm/wlp/bin/installUtility: line 218:    40 Real-time signal 0      "${JAVA_CMD}" "$@"
#8 52.22 + rc=162
#8 52.23 + '[' 162 -ne 22 ']'
#8 52.23 + exit 162
#8 ERROR: process "/dev/.buildkit_qemu_emulator /bin/sh -c configure.sh" did not complete successfully: exit code: 162

There was no /opt/ol/wlp/bin/server directory, so I changed it according to our code /opt/ibm/wlp/bin/server and the output was:

#7 [ 3/30] RUN /opt/ibm/wlp/bin/server start  || { ec=$?; echo "==> start failed" >&2; ls -la /logs >&2; cat /logs/messages.log >&2; cat /logs/console.log >&2; cat /logs/start.log >&2; exit $ec; }
#7 0.625 
#7 4.067 Starting server defaultServer.
#7 12.03 Server defaultServer started with process ID 43.
#7 DONE 12.3s

Here's the output of server stop command:

#8 [ 4/30] RUN /opt/ibm/wlp/bin/server stop  || { ec=$?; echo "==> stop failed" >&2; ls -la /logs >&2; cat /logs/messages.log >&2; cat /logs/console.log >&2; cat /logs/stop.log >&2; exit $ec; }
#8 0.592 
#8 3.624 Stopping server defaultServer.
#8 3.684 Server defaultServer is not running.
#8 3.878 ==> stop failed
#8 3.968 total 112
#8 3.968 drwxrwxr-x 1 default root    22 Sep  5 07:38 .
#8 3.968 drwxr-xr-x 1 root    root    40 Sep  5 07:37 ..
#8 3.968 -rw-r----- 1 root    root  6299 Sep  5 07:37 console.log
#8 3.968 -rw-r----- 1 root    root  9061 Sep  5 07:37 messages.log
#8 3.968 -rw-r----- 1 root    root   372 Sep  5 07:38 stop.log
#8 3.968 -rw-r----- 1 root    root 90080 Sep  5 07:37 verbosegc.001.log
#8 4.029 ********************************************************************************
#8 4.029 product = WebSphere Application Server 24.0.0.6 (wlp-1.0.90.cl240620240603-2001)
#8 4.029 wlp.install.dir = /opt/ibm/wlp/
#8 4.029 server.output.dir = /opt/ibm/wlp/output/defaultServer/
#8 4.029 java.home = /opt/java/openjdk
#8 4.029 java.version = 11.0.24
#8 4.029 java.runtime = IBM Semeru Runtime Open Edition (11.0.24+8)
#8 4.029 os = Linux (4.19.86; s390x) (en_US)
#8 4.029 process = [email protected]
#8 4.029 Classpath = /opt/ibm/wlp/bin/tools/ws-server.jar
#8 4.029 Java Library path = /opt/java/openjdk/lib/default:/opt/java/openjdk/lib:/usr/lib64:/usr/lib
#8 4.029 ********************************************************************************
#8 4.029 [9/5/24, 7:37:50:865 UTC] 00000001 com.ibm.ws.kernel.launch.internal.FrameworkManager           A CWWKE0001I: The server defaultServer has been launched.
#8 4.029 [9/5/24, 7:37:50:963 UTC] 00000001 com.ibm.ws.kernel.launch.internal.FrameworkManager           A CWWKE0100I: This product is licensed for development, and limited production use. The full license terms can be viewed here: https://public.dhe.ibm.com/ibmdl/export/pub/software/websphere/wasdev/license/base_ilan/ilan/24.0.0.6/lafiles/en.html
#8 4.029 [9/5/24, 7:37:56:327 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            I CWWKF0007I: Feature update started.
#8 4.029 [9/5/24, 7:37:56:340 UTC] 00000001 com.ibm.ws.kernel.launch.internal.FrameworkManager           I CWWKE0002I: The kernel started after 7.186 seconds
#8 4.029 [9/5/24, 7:37:56:622 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the ssl-1.0 feature.  Try running the command, bin/installUtility install ssl-1.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:631 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the wmqjmsclient-2.0 feature.  Try running the command, bin/installUtility install wmqjmsclient-2.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:734 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the jndi-1.0 feature.  Try running the command, bin/installUtility install jndi-1.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:812 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the ejbremote-3.2 feature.  Try running the command, bin/installUtility install ejbremote-3.2, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:819 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the openidconnectclient-1.0 feature.  Try running the command, bin/installUtility install openidconnectclient-1.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:826 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the jdbc-4.2 feature.  Try running the command, bin/installUtility install jdbc-4.2, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:833 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the jaxws-2.2 feature.  Try running the command, bin/installUtility install jaxws-2.2, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:839 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the jsonp-1.1 feature.  Try running the command, bin/installUtility install jsonp-1.1, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:847 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the javamail-1.6 feature.  Try running the command, bin/installUtility install javamail-1.6, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:853 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the ejbhome-3.2 feature.  Try running the command, bin/installUtility install ejbhome-3.2, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:861 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the wasjmsclient-2.0 feature.  Try running the command, bin/installUtility install wasjmsclient-2.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:867 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the transportsecurity-1.0 feature.  Try running the command, bin/installUtility install transportsecurity-1.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:874 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the webprofile-8.0 feature.  Try running the command, bin/installUtility install webprofile-8.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:881 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the jwtsso-1.0 feature.  Try running the command, bin/installUtility install jwtsso-1.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:894 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the monitor-1.0 feature.  Try running the command, bin/installUtility install monitor-1.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:929 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the wasjmsserver-1.0 feature.  Try running the command, bin/installUtility install wasjmsserver-1.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:56:942 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            E CWWKF0042E: A feature definition cannot be found for the jmsmdb-3.2 feature.  Try running the command, bin/installUtility install jmsmdb-3.2, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.029 [9/5/24, 7:37:57:186 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            A CWWKF0012I: The server installed the following features: [].
#8 4.029 [9/5/24, 7:37:57:191 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            I CWWKF0008I: Feature update completed in 0.885 seconds.
#8 4.029 [9/5/24, 7:37:57:193 UTC] 00000034 com.ibm.ws.kernel.feature.internal.FeatureManager            A CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 8.062 seconds.
#8 4.092 Launching defaultServer (WebSphere Application Server 24.0.0.6/wlp-1.0.90.cl240620240603-2001) on Eclipse OpenJ9 VM, version 11.0.24+8 (en_US)
#8 4.092 [AUDIT   ] CWWKE0001I: The server defaultServer has been launched.
#8 4.092 [AUDIT   ] CWWKE0100I: This product is licensed for development, and limited production use. The full license terms can be viewed here: https://public.dhe.ibm.com/ibmdl/export/pub/software/websphere/wasdev/license/base_ilan/ilan/24.0.0.6/lafiles/en.html
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the ssl-1.0 feature.  Try running the command, bin/installUtility install ssl-1.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the wmqjmsclient-2.0 feature.  Try running the command, bin/installUtility install wmqjmsclient-2.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the jndi-1.0 feature.  Try running the command, bin/installUtility install jndi-1.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the ejbremote-3.2 feature.  Try running the command, bin/installUtility install ejbremote-3.2, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the openidconnectclient-1.0 feature.  Try running the command, bin/installUtility install openidconnectclient-1.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the jdbc-4.2 feature.  Try running the command, bin/installUtility install jdbc-4.2, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the jaxws-2.2 feature.  Try running the command, bin/installUtility install jaxws-2.2, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the jsonp-1.1 feature.  Try running the command, bin/installUtility install jsonp-1.1, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the javamail-1.6 feature.  Try running the command, bin/installUtility install javamail-1.6, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the ejbhome-3.2 feature.  Try running the command, bin/installUtility install ejbhome-3.2, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the wasjmsclient-2.0 feature.  Try running the command, bin/installUtility install wasjmsclient-2.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the transportsecurity-1.0 feature.  Try running the command, bin/installUtility install transportsecurity-1.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the webprofile-8.0 feature.  Try running the command, bin/installUtility install webprofile-8.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the jwtsso-1.0 feature.  Try running the command, bin/installUtility install jwtsso-1.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the monitor-1.0 feature.  Try running the command, bin/installUtility install monitor-1.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the wasjmsserver-1.0 feature.  Try running the command, bin/installUtility install wasjmsserver-1.0, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [ERROR   ] CWWKF0042E: A feature definition cannot be found for the jmsmdb-3.2 feature.  Try running the command, bin/installUtility install jmsmdb-3.2, to install the feature. Alternatively, you can run the command, bin/installUtility install defaultServer, to install all features that are referenced by this configuration.
#8 4.092 [AUDIT   ] CWWKF0012I: The server installed the following features: [].
#8 4.092 [AUDIT   ] CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 8.062 seconds.
#8 4.159 
#8 4.159 arg0=defaultServer
#8 4.159 arg1=--pid=43
#8 4.159 arg2=--stop
#8 4.159 exit=1
#8 4.159 
#8 4.159                  Java home:  /opt/java/openjdk
#8 4.159               Install root:  /opt/ibm/wlp/
#8 4.159           System libraries:  /opt/ibm/wlp/lib/
#8 4.159                  User root:  /opt/ibm/wlp/usr/
#8 4.159                     Config:  /opt/ibm/wlp/usr/servers/defaultServer/
#8 4.159                     Output:  /opt/ibm/wlp/output/defaultServer/
#8 4.159 
#8 ERROR: process "/dev/.buildkit_qemu_emulator /bin/sh -c /opt/ibm/wlp/bin/server stop  || { ec=$?; echo \"==> stop failed\" >&2; ls -la /logs >&2; cat /logs/messages.log >&2; cat /logs/console.log >&2; cat /logs/stop.log >&2; exit $ec; }" did not complete successfully: exit code: 1
------
 > [ 4/30] RUN /opt/ibm/wlp/bin/server stop  || { ec=$?; echo "==> stop failed" >&2; ls -la /logs >&2; cat /logs/messages.log >&2; cat /logs/console.log >&2; cat /logs/stop.log >&2; exit $ec; }:
4.159 arg2=--stop
4.159 exit=1
4.159 
4.159                  Java home:  /opt/java/openjdk
4.159               Install root:  /opt/ibm/wlp/
4.159           System libraries:  /opt/ibm/wlp/lib/
4.159                  User root:  /opt/ibm/wlp/usr/
4.159                     Config:  /opt/ibm/wlp/usr/servers/defaultServer/
4.159                     Output:  /opt/ibm/wlp/output/defaultServer/
4.159 
------
Dockerfile:52
--------------------
  50 |     RUN /opt/ibm/wlp/bin/server start  || { ec=$?; echo "==> start failed" >&2; ls -la /logs >&2; cat /logs/messages.log >&2; cat /logs/console.log >&2; cat /logs/start.log >&2; exit $ec; }
  51 |     
  52 | >>> RUN /opt/ibm/wlp/bin/server stop  || { ec=$?; echo "==> stop failed" >&2; ls -la /logs >&2; cat /logs/messages.log >&2; cat /logs/console.log >&2; cat /logs/stop.log >&2; exit $ec; }
  53 |     
  54 |     RUN sed -i "s|\/opt\/ibm\/wlp\/bin\/server start|\/opt\/ibm\/wlp\/bin\/server start || { ec\=\$\?\; echo \"==> start failed\" \>\&2\; ls \-la \/logs \>\&2\; cat \/logs\/messages.log \>\&2\; cat \/logs\/console.log \>\&2\; cat \/logs\/start.log \>\&2\; exit \$ec\; }|g" /opt/ibm/helpers/build/populate_scc.sh
--------------------
ERROR: failed to solve: process "/dev/.buildkit_qemu_emulator /bin/sh -c /opt/ibm/wlp/bin/server stop  || { ec=$?; echo \"==> stop failed\" >&2; ls -la /logs >&2; cat /logs/messages.log >&2; cat /logs/console.log >&2; cat /logs/stop.log >&2; exit $ec; }" did not complete successfully: exit code: 1

Let me know if anything else is required.
Thanks!

@leochr
Copy link
Member

leochr commented Sep 6, 2024

@Rohanjos97 Thank you for providing the logs.

@ymanton Error occurs during SCC generation (populate_scc.sh). Noticed the errors below. Could you please take a look to see if this is same as the the other issue (segmentation error / findSharedClassImpl3)? Thank you.

#8 12.50 Preparing assets for installation. This process might take several minutes to complete.
#8 44.66 Unhandled exception
#8 44.66 Type=Floating point error vmState=0x00000000
#8 44.66 J9Generic_Signal_Number=00000088 Signal_Number=00000008 Error_Value=00000000 Signal_Code=00000000
....
....
----------- Stack Backtrace -----------
#8 44.67  (0x000000401D695322 [<unknown>+0x0])
#8 44.67 ---------------------------------------
#8 44.68 JVMDUMP039I Processing dump event "gpf", detail "" at 2024/08/29 17:20:27 - please wait.
#8 44.69 JVMDUMP032I JVM requested System dump using '//core.20240829.172027.40.0001.dmp' in response to an event
#8 46.10 JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/lib/systemd/coredump-wrapper %E %P %u %g %s %t %c %h %e" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.78.  Review the manual for the external program to find where the core dump is written and ensure the program does not truncate it.
#8 46.10 
#8 52.02 JVMPORT049I The core file created by child process with pid = 78 was not found. Review the documentation for the /proc/sys/kernel/core_pattern program "|/usr/lib/systemd/coredump-wrapper %E %P %u %g %s %t %c %h %e" to find where the core file is written and ensure that program does not truncate it.
#8 52.02 
#8 52.05 JVMDUMP012E Error in System dump: //core.20240829.172027.40.0001.dmp
#8 52.06 JVMDUMP032I JVM requested Java dump using '//javacore.20240829.172027.40.0002.txt' in response to an event

@ymanton
Copy link
Contributor

ymanton commented Sep 6, 2024

No, this is a different kind of error. The JVM is crashing while executing JIT compiled code because of a floating point exception. FYI @r30shah.

@leochr
Copy link
Member

leochr commented Sep 6, 2024

Thank you for the confirmation Younes.

The floating point exception occurs intermittently, but server stop failure is consistent, at least on s390x. I think we should focus on the server stop failure during SCC generation.

@Rohanjos97 Are the latest logs from starting and stopping the server within populate_scc.sh? Noticed some feature definition missing error (CWWKF0042E), so wanted to clarify. Thank you

@ymanton
Copy link
Contributor

ymanton commented Sep 7, 2024

I don't see any output from the JVM in any of the logs after server stop, but the process looks like its no longer running.

That suggests that it was killed forcibly without giving the JVM a chance to handle the error. Typically that happens when the system is low on memory.

Is it possible to check the kernel log after server stop fails? Can you execute the dmesg command once the container image build fails and capture the output? If dmesg doesn't work on Travis something like sudo journalctl might

@r30shah
Copy link

r30shah commented Sep 9, 2024

Regarding to failures in the JIT compiled code posted in #635 (comment), I see that JVM tries to produce the system dump at the time of the failure, but could not (May be due core pattern is not set inside container / some other limitations on the container), if this is consistently reproducible, would it be possible to get the core-dump ?

I am not sure if this is reproducible using one of the template servers, will try couple to see, but in order to diagnosis the segmentation fault, would need core-dump to see why it failed with floating point error.

@Rohanjos97
Copy link
Author

@Rohanjos97 Are the latest logs from starting and stopping the server within populate_scc.sh? Noticed some feature definition missing error (CWWKF0042E), so wanted to clarify. Thank you

Hi @leochr, sorry for the late response, yes the logs are from within populate_scc.sh file as I modified the code from inside my container using this sed command
RUN sed -i "s|\/opt\/ibm\/wlp\/bin\/server stop|\/opt\/ibm\/wlp\/bin\/server stop \|\| cat \/logs\/console.log \&\& \/opt\/ibm\/wlp\/bin\/server status|g" /opt/ibm/helpers/build/populate_scc.sh

@leochr
Copy link
Member

leochr commented Sep 10, 2024

@ymanton @r30shah Thank you for taking a look and suggesting the diagnostic materials required for further investigation.

@Rohanjos97 Could you please gather them? Thank you

@Rohanjos97
Copy link
Author

Rohanjos97 commented Sep 11, 2024

Hi all,
I tried to get the logs using dmesg and journalctl command as suggested by @ymanton, but I was not able to fetch the logs of dmesg command

output of sudo dmesg :

#8 329.0 Stopping server defaultServer.
#8 332.6 The command stop failed because of a communication error with the server.
#8 332.6 Server defaultServer stop failed. Check server logs for details.
#8 332.7 + sudo dmesg
#8 332.8 dmesg: read kernel buffer failed: Operation not permitted
#8 332.8 + cat /logs/console.log

I'm running these commands with root user and tried to install sudo, but still seeing the same error for s390x achitecture...

output of journalctl :

#8 301.7 + journalctl
#8 301.8 No journal files were found.
#8 301.8 -- No entries --

I'm trying to print the output of dmesg as suggested by @ymanton once I manage to get past the sudo error, I'll post it here. Meanwhile, lemme know if you want me to try some other commands.

@r30shah Please refer the attached file for the dumpfile that was in the location: opt/ibm/wlp/output/defaultServer/javacore.20240911.071456.259.0002.txt.

Please let me know if I missed something/ more information is required.

dump_file.txt

Thanks!

@Rohanjos97
Copy link
Author

Hi All,
I resolved the sudo error that I was getting earlier on usingdmesg command, but it seems it wasn't a sudo error, but something else... been trying to work it out, but couldn't get to print any logs even after fixing sudo.

I tried to install rsyslog and check if I can print logs from /var/log/syslog, but that also doesn't show up.

There are all the files available at the /var/log location, please let me know if you require any of these logs to investigate.

Screenshot 2024-09-13 at 1 12 44 PM

Thanks!

@ymanton
Copy link
Contributor

ymanton commented Sep 13, 2024

Try sudo journalctl rather than just journalctl. If that still doesn't produce anything try:

sudo grep -r "Killed process" /var/log/*

@Rohanjos97
Copy link
Author

Hi @ymanton,

I tried both commands,

sudo journalctl still returns same output:

#15 330.3 + sudo journalctl
#15 330.6 No journal files were found.
#15 330.6 -- No entries --

And as for this command: sudo grep -r "Killed process" /var/log/*, I got no matching result in those logs.

Anything else you have in mind that I could try?

Thanks!

@ymanton
Copy link
Contributor

ymanton commented Sep 13, 2024

Just so we are on the same page, you are checking the journal, dmesg, /var/log, etc on the Travis build node, not inside the build container where server stop fails, right?

The content of your /var/log above shows stuff like dnf.log and rhsm which I would expect on RedHat OSes and inside our UBI-based containers, but not on Travis build nodes which run Ubuntu.

@Rohanjos97
Copy link
Author

Just so we are on the same page, you are checking the journal, dmesg, /var/log, etc on the Travis build node, not inside the build container where server stop fails, right?

I am checking it inside the container... Since it is the container which is failing, I assumed that all the logs would be inside the container...

Do I have to run these commands in my travis build after the container fails?

@ymanton
Copy link
Contributor

ymanton commented Sep 13, 2024

If the kernel is terminating the JVM because of lack of memory then the logs will not be in the container, they will be on the host.

Try the previous suggestions on the Travis build node, not in the container.

@r30shah
Copy link

r30shah commented Sep 13, 2024

Hi @Rohanjos97 Regarding to #635 (comment), I am looking for the system dump which I think original failure could not reproduce,

#8 44.68 JVMDUMP039I Processing dump event "gpf", detail "" at 2024/08/29 17:20:27 - please wait.
#8 44.69 JVMDUMP032I JVM requested System dump using '//core.20240829.172027.40.0001.dmp' in response to an event
#8 46.10 JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/lib/systemd/coredump-wrapper %E %P %u %g %s %t %c %h %e" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.78.  Review the manual for the external program to find where the core dump is written and ensure the program does not truncate it.
#8 46.10 
#8 52.02 JVMPORT049I The core file created by child process with pid = 78 was not found. Review the documentation for the /proc/sys/kernel/core_pattern program "|/usr/lib/systemd/coredump-wrapper %E %P %u %g %s %t %c %h %e" to find where the core file is written and ensure that program does not truncate it.
#8 52.02 
#8 52.05 JVMDUMP012E Error in System dump: //core.20240829.172027.40.0001.dmp

There may be something you need to do for core_pattern / or may be some system limits is causing it (I am not sure).

The one you shared is Javacore, using which I can not know, why we hit the floating point error.

@Rohanjos97
Copy link
Author

Rohanjos97 commented Sep 13, 2024

Try the previous suggestions on the Travis build node, not in the container.

Here's the output of dmesg that I ran on travis ndoe... I guess I got it wrong the whole time 😅. Thanks for pointing it out though.

Also, since we are talking about travis, I think I should mention that the travis is not s390x node, but a amd64 node.

$ sudo dmesg
[    0.000000] Linux version 4.19.86 (root@57d0e642e9c6) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #1 SMP Tue Nov 1 11:03:14 UTC 2022
[    0.000000] Command line: tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=64 agent.use_vsock=false systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket scsi_mod.scan=none
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
[    0.000000] x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
[    0.000000] x86/fpu: xstate_offset[5]:  960, xstate_sizes[5]:   64
[    0.000000] x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]:  512
[    0.000000] x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024
[    0.000000] x86/fpu: xstate_offset[9]: 2560, xstate_sizes[9]:    8
[    0.000000] x86/fpu: Enabled xstate features 0x2ff, context size is 2568 bytes, using 'compacted' format.
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffddfff] usable
[    0.000000] BIOS-e820: [mem 0x000000007ffde000-0x000000007fffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.8 present.
[    0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[    0.000000] Hypervisor detected: KVM
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 2239001, primary cpu clock
[    0.000000] kvm-clock: using sched offset of 264820442 cycles
[    0.000004] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000009] tsc: Detected 2294.580 MHz processor
[    0.000492] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.000494] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000496] last_pfn = 0x7ffde max_arch_pfn = 0x400000000
[    0.000528] MTRR default type: write-back
[    0.000528] MTRR fixed ranges enabled:
[    0.000529]   00000-9FFFF write-back
[    0.000530]   A0000-BFFFF uncachable
[    0.000530]   C0000-FFFFF write-protect
[    0.000531] MTRR variable ranges enabled:
[    0.000531]   0 base 0080000000 mask FF80000000 uncachable
[    0.000532]   1 disabled
[    0.000532]   2 disabled
[    0.000532]   3 disabled
[    0.000533]   4 disabled
[    0.000533]   5 disabled
[    0.000533]   6 disabled
[    0.000533]   7 disabled
[    0.000543] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[    0.013754] Using GB pages for direct mapping
[    0.013763] BRK [0x02401000, 0x02401fff] PGTABLE
[    0.013765] BRK [0x02402000, 0x02402fff] PGTABLE
[    0.013766] BRK [0x02403000, 0x02403fff] PGTABLE
[    0.013792] BRK [0x02404000, 0x02404fff] PGTABLE
[    0.013885] BRK [0x02405000, 0x02405fff] PGTABLE
[    0.013916] ACPI: Early table checksum verification disabled
[    0.013972] ACPI: RSDP 0x00000000000F5BA0 000014 (v00 BOCHS )
[    0.013975] ACPI: RSDT 0x000000007FFE2724 00003C (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
[    0.013987] ACPI: FACP 0x000000007FFE1ACB 000074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
[    0.013992] ACPI: DSDT 0x000000007FFDE040 003A8B (v01 BOCHS  BXPCDSDT 00000001 BXPC 00000001)
[    0.013996] ACPI: FACS 0x000000007FFDE000 000040
[    0.013998] ACPI: APIC 0x000000007FFE1B3F 000270 (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
[    0.014001] ACPI: HPET 0x000000007FFE1DAF 000038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
[    0.014004] ACPI: SRAT 0x000000007FFE1DE7 0004D0 (v01 BOCHS  BXPCSRAT 00000001 BXPC 00000001)
[    0.014007] ACPI: SSDT 0x000000007FFE22B7 00038D (v01 BOCHS  NVDIMM   00000001 BXPC 00000001)
[    0.014010] ACPI: NFIT 0x000000007FFE2644 0000E0 (v01 BOCHS  BXPCNFIT 00000001 BXPC 00000001)
[    0.014020] ACPI: Local APIC address 0xfee00000
[    0.015113] Zone ranges:
[    0.015114]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.015116]   DMA32    [mem 0x0000000001000000-0x000000007ffddfff]
[    0.015116]   Normal   empty
[    0.015117]   Device   empty
[    0.015118] Movable zone start for each node
[    0.015118] Early memory node ranges
[    0.015119]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.015120]   node   0: [mem 0x0000000000100000-0x000000007ffddfff]
[    0.015649] Reserved but unavailable: 98 pages
[    0.015650] Initmem setup node 0 [mem 0x0000000000001000-0x000000007ffddfff]
[    0.015651] On node 0 totalpages: 524156
[    0.015655]   DMA zone: 64 pages used for memmap
[    0.015656]   DMA zone: 21 pages reserved
[    0.015657]   DMA zone: 3998 pages, LIFO batch:0
[    0.015723]   DMA32 zone: 8128 pages used for memmap
[    0.015724]   DMA32 zone: 520158 pages, LIFO batch:63
[    0.031109] ACPI: PM-Timer IO Port: 0x608
[    0.031112] ACPI: Local APIC address 0xfee00000
[    0.031129] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.031166] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[    0.031169] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.031170] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.031171] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.031176] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.031176] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.031177] ACPI: IRQ0 used by override.
[    0.031178] ACPI: IRQ5 used by override.
[    0.031178] ACPI: IRQ9 used by override.
[    0.031179] ACPI: IRQ10 used by override.
[    0.031179] ACPI: IRQ11 used by override.
[    0.031181] Using ACPI (MADT) for SMP configuration information
[    0.031182] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.031186] smpboot: Allowing 64 CPUs, 56 hotplug CPUs
[    0.031220] [mem 0x80000000-0xfeffbfff] available for PCI devices
[    0.031220] Booting paravirtualized kernel on KVM
[    0.031222] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.098505] random: get_random_bytes called from start_kernel+0x81/0x42e with crng_init=0
[    0.098520] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:64 nr_node_ids:1
[    0.104293] percpu: Embedded 39 pages/cpu s127448 r0 d32296 u262144
[    0.104303] pcpu-alloc: s127448 r0 d32296 u262144 alloc=1*2097152
[    0.104304] pcpu-alloc: [0] 00 01 02 03 04 05 06 07 [0] 08 09 10 11 12 13 14 15 
[    0.104310] pcpu-alloc: [0] 16 17 18 19 20 21 22 23 [0] 24 25 26 27 28 29 30 31 
[    0.104315] pcpu-alloc: [0] 32 33 34 35 36 37 38 39 [0] 40 41 42 43 44 45 46 47 
[    0.104320] pcpu-alloc: [0] 48 49 50 51 52 53 54 55 [0] 56 57 58 59 60 61 62 63 
[    0.104349] KVM setup async PF for cpu 0
[    0.104354] kvm-stealtime: cpu 0, msr 7cc14e40
[    0.104362] Built 1 zonelists, mobility grouping on.  Total pages: 515943
[    0.104365] Kernel command line: tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=64 agent.use_vsock=false systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket scsi_mod.scan=none
[    0.104606] log_buf_len individual max cpu contribution: 4096 bytes
[    0.104607] log_buf_len total cpu_extra contributions: 258048 bytes
[    0.104607] log_buf_len min size: 131072 bytes
[    0.104699] log_buf_len: 524288 bytes
[    0.104700] early log buf free: 121916(93%)
[    0.105304] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
[    0.105471] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.110781] Memory: 2029468K/2096624K available (10252K kernel code, 506K rwdata, 2044K rodata, 872K init, 1892K bss, 67156K reserved, 0K cma-reserved)
[    0.111664] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=64, Nodes=1
[    0.112485] rcu: Hierarchical RCU implementation.
[    0.112488] 	All grace periods are expedited (rcu_expedited).
[    0.112577] NR_IRQS: 4352, nr_irqs: 936, preallocated irqs: 16
[    0.112789] rcu: 	Offload RCU callbacks from CPUs: (none).
[    0.113004] Console: colour *CGA 80x25
[    0.113053] ACPI: Core revision 20180810
[    0.113325] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns
[    0.113416] hpet clockevent registered
[    0.113422] APIC: Switch to symmetric I/O mode setup
[    0.113621] x2apic enabled
[    0.113860] Switched APIC routing to physical x2apic.
[    0.113862] KVM setup pv IPIs
[    0.114706] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.114732] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2113352c5e9, max_idle_ns: 440795314412 ns
[    0.114737] Calibrating delay loop (skipped) preset value.. 4589.16 BogoMIPS (lpj=9178320)
[    0.114738] pid_max: default: 65536 minimum: 512
[    0.114952] Security Framework initialized
[    0.115032] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes)
[    0.115035] Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes)
[    0.115539] x86/cpu: Activated the Intel User Mode Instruction Prevention (UMIP) CPU feature
[    0.115545] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[    0.115545] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[    0.115551] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    0.115556] Spectre V2 : Mitigation: Enhanced IBRS
[    0.115557] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    0.115558] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[    0.115561] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
[    0.115955] Freeing SMP alternatives memory: 32K
[    0.116923] TSC deadline timer enabled
[    0.116939] smpboot: CPU0: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz (family: 0x6, model: 0x55, stepping: 0x7)
[    0.117040] Performance Events: Skylake events, Intel PMU driver.
[    0.117067] ... version:                2
[    0.117067] ... bit width:              48
[    0.117067] ... generic registers:      4
[    0.117068] ... value mask:             0000ffffffffffff
[    0.117068] ... max period:             000000007fffffff
[    0.117069] ... fixed-purpose events:   3
[    0.117069] ... event mask:             000000070000000f
[    0.117139] rcu: Hierarchical SRCU implementation.
[    0.118072] smp: Bringing up secondary CPUs ...
[    0.118128] x86: Booting SMP configuration:
[    0.118129] .... node  #0, CPUs:        #1
[    0.005928] kvm-clock: cpu 1, msr 2239041, secondary cpu clock
[    0.005928] x86/cpu: Activated the Intel User Mode Instruction Prevention (UMIP) CPU feature
[    0.118710] KVM setup async PF for cpu 1
[    0.118710] kvm-stealtime: cpu 1, msr 7cc54e40
[    0.118710]   #2
[    0.005928] kvm-clock: cpu 2, msr 2239081, secondary cpu clock
[    0.005928] x86/cpu: Activated the Intel User Mode Instruction Prevention (UMIP) CPU feature
[    0.118710] KVM setup async PF for cpu 2
[    0.118710] kvm-stealtime: cpu 2, msr 7cc94e40
[    0.118710]   #3
[    0.005928] kvm-clock: cpu 3, msr 22390c1, secondary cpu clock
[    0.005928] x86/cpu: Activated the Intel User Mode Instruction Prevention (UMIP) CPU feature
[    0.118710] KVM setup async PF for cpu 3
[    0.118710] kvm-stealtime: cpu 3, msr 7ccd4e40
[    0.118710]   #4
[    0.005928] kvm-clock: cpu 4, msr 2239101, secondary cpu clock
[    0.005928] x86/cpu: Activated the Intel User Mode Instruction Prevention (UMIP) CPU feature
[    0.118923] KVM setup async PF for cpu 4
[    0.118923] kvm-stealtime: cpu 4, msr 7cd14e40
[    0.118923]   #5
[    0.005928] kvm-clock: cpu 5, msr 2239141, secondary cpu clock
[    0.005928] x86/cpu: Activated the Intel User Mode Instruction Prevention (UMIP) CPU feature
[    0.119376] KVM setup async PF for cpu 5
[    0.119376] kvm-stealtime: cpu 5, msr 7cd54e40
[    0.119376]   #6
[    0.005928] kvm-clock: cpu 6, msr 2239181, secondary cpu clock
[    0.005928] x86/cpu: Activated the Intel User Mode Instruction Prevention (UMIP) CPU feature
[    0.119377] KVM setup async PF for cpu 6
[    0.119377] kvm-stealtime: cpu 6, msr 7cd94e40
[    0.119377]   #7
[    0.005928] kvm-clock: cpu 7, msr 22391c1, secondary cpu clock
[    0.005928] x86/cpu: Activated the Intel User Mode Instruction Prevention (UMIP) CPU feature
[    0.119377] KVM setup async PF for cpu 7
[    0.119377] kvm-stealtime: cpu 7, msr 7cdd4e40
[    0.119377] smp: Brought up 1 node, 8 CPUs
[    0.119377] smpboot: Max logical packages: 64
[    0.119377] smpboot: Total of 8 processors activated (36713.28 BogoMIPS)
[    0.119443] devtmpfs: initialized
[    0.119443] x86/mm: Memory block size: 128MB
[    0.122800] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.122806] futex hash table entries: 16384 (order: 8, 1048576 bytes)
[    0.123715] NET: Registered protocol family 16
[    0.123846] cpuidle: using governor menu
[    0.123846] KVM setup pv remote TLB flush
[    0.123846] ACPI: bus type PCI registered
[    0.123846] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.123846] PCI: Using configuration type 1 for base access
[    0.123846] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    0.123846] ACPI: Added _OSI(Module Device)
[    0.123846] ACPI: Added _OSI(Processor Device)
[    0.123846] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.123846] ACPI: Added _OSI(Processor Aggregator Device)
[    0.123846] ACPI: Added _OSI(Linux-Dell-Video)
[    0.123846] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[    0.126818] ACPI: 2 ACPI AML tables successfully acquired and loaded
[    0.128396] ACPI: Interpreter enabled
[    0.128399] ACPI: (supports S0 S5)
[    0.128400] ACPI: Using IOAPIC for interrupt routing
[    0.128942] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.129124] ACPI: Enabled 4 GPEs in block 00 to 0F
[    0.134154] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.134163] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI]
[    0.134167] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[    0.134394] acpiphp: Slot [3] registered
[    0.134413] acpiphp: Slot [4] registered
[    0.134434] acpiphp: Slot [5] registered
[    0.134453] acpiphp: Slot [6] registered
[    0.134470] acpiphp: Slot [7] registered
[    0.134492] acpiphp: Slot [8] registered
[    0.134509] acpiphp: Slot [9] registered
[    0.134528] acpiphp: Slot [10] registered
[    0.134549] acpiphp: Slot [11] registered
[    0.134566] acpiphp: Slot [12] registered
[    0.134586] acpiphp: Slot [13] registered
[    0.134604] acpiphp: Slot [14] registered
[    0.134621] acpiphp: Slot [15] registered
[    0.134642] acpiphp: Slot [16] registered
[    0.134659] acpiphp: Slot [17] registered
[    0.134678] acpiphp: Slot [18] registered
[    0.134697] acpiphp: Slot [19] registered
[    0.134716] acpiphp: Slot [20] registered
[    0.134736] acpiphp: Slot [21] registered
[    0.134754] acpiphp: Slot [22] registered
[    0.134772] acpiphp: Slot [23] registered
[    0.134796] acpiphp: Slot [24] registered
[    0.134820] acpiphp: Slot [25] registered
[    0.134845] acpiphp: Slot [26] registered
[    0.134863] acpiphp: Slot [27] registered
[    0.134880] acpiphp: Slot [28] registered
[    0.134901] acpiphp: Slot [29] registered
[    0.134918] acpiphp: Slot [30] registered
[    0.134937] acpiphp: Slot [31] registered
[    0.134944] PCI host bridge to bus 0000:00
[    0.134946] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    0.134947] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    0.134948] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    0.134949] pci_bus 0000:00: root bus resource [mem 0x80000000-0xfebfffff window]
[    0.134950] pci_bus 0000:00: root bus resource [mem 0x61c0000000-0x623fffffff window]
[    0.134951] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.134972] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
[    0.135237] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
[    0.135595] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
[    0.138743] pci 0000:00:01.1: reg 0x20: [io  0xd120-0xd12f]
[    0.140090] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
[    0.140091] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
[    0.140092] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
[    0.140093] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
[    0.140210] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
[    0.140466] pci 0000:00:01.3: quirk: [io  0x0600-0x063f] claimed by PIIX4 ACPI
[    0.140470] pci 0000:00:01.3: quirk: [io  0x0700-0x070f] claimed by PIIX4 SMB
[    0.140618] pci 0000:00:02.0: [1b36:0001] type 01 class 0x060400
[    0.142742] pci 0000:00:02.0: reg 0x10: [mem 0xfe600000-0xfe6000ff 64bit]
[    0.144147] pci 0000:00:03.0: [1af4:1003] type 00 class 0x078000
[    0.146293] pci 0000:00:03.0: reg 0x10: [io  0xd000-0xd03f]
[    0.147720] pci 0000:00:03.0: reg 0x14: [mem 0xfe601000-0xfe601fff]
[    0.153727] pci 0000:00:03.0: reg 0x20: [mem 0xfea00000-0xfea03fff 64bit pref]
[    0.155141] pci 0000:00:04.0: [1af4:1004] type 00 class 0x010000
[    0.156677] pci 0000:00:04.0: reg 0x10: [io  0xd040-0xd07f]
[    0.158283] pci 0000:00:04.0: reg 0x14: [mem 0xfe602000-0xfe602fff]
[    0.164158] pci 0000:00:04.0: reg 0x20: [mem 0xfea04000-0xfea07fff 64bit pref]
[    0.165588] pci 0000:00:05.0: [1af4:1005] type 00 class 0x00ff00
[    0.166527] pci 0000:00:05.0: reg 0x10: [io  0xd100-0xd11f]
[    0.169839] pci 0000:00:05.0: reg 0x20: [mem 0xfea08000-0xfea0bfff 64bit pref]
[    0.173435] pci 0000:00:06.0: [1af4:1009] type 00 class 0x000200
[    0.174743] pci 0000:00:06.0: reg 0x10: [io  0xd080-0xd0bf]
[    0.176030] pci 0000:00:06.0: reg 0x14: [mem 0xfe603000-0xfe603fff]
[    0.179508] pci 0000:00:06.0: reg 0x20: [mem 0xfea0c000-0xfea0ffff 64bit pref]
[    0.181243] pci 0000:00:07.0: [1af4:1000] type 00 class 0x020000
[    0.182621] pci 0000:00:07.0: reg 0x10: [io  0xd0c0-0xd0ff]
[    0.183404] pci 0000:00:07.0: reg 0x14: [mem 0xfe604000-0xfe604fff]
[    0.186743] pci 0000:00:07.0: reg 0x20: [mem 0xfea10000-0xfea13fff 64bit pref]
[    0.188841] pci_bus 0000:01: extended config space not accessible
[    0.189026] acpiphp: Slot [0] registered
[    0.189049] acpiphp: Slot [1] registered
[    0.189066] acpiphp: Slot [2] registered
[    0.189094] acpiphp: Slot [3-2] registered
[    0.189116] acpiphp: Slot [4-2] registered
[    0.189137] acpiphp: Slot [5-2] registered
[    0.189161] acpiphp: Slot [6-2] registered
[    0.189182] acpiphp: Slot [7-2] registered
[    0.189202] acpiphp: Slot [8-2] registered
[    0.189225] acpiphp: Slot [9-2] registered
[    0.189246] acpiphp: Slot [10-2] registered
[    0.189267] acpiphp: Slot [11-2] registered
[    0.189290] acpiphp: Slot [12-2] registered
[    0.189309] acpiphp: Slot [13-2] registered
[    0.189332] acpiphp: Slot [14-2] registered
[    0.189353] acpiphp: Slot [15-2] registered
[    0.189373] acpiphp: Slot [16-2] registered
[    0.189397] acpiphp: Slot [17-2] registered
[    0.189417] acpiphp: Slot [18-2] registered
[    0.189438] acpiphp: Slot [19-2] registered
[    0.189461] acpiphp: Slot [20-2] registered
[    0.189480] acpiphp: Slot [21-2] registered
[    0.189503] acpiphp: Slot [22-2] registered
[    0.189524] acpiphp: Slot [23-2] registered
[    0.189543] acpiphp: Slot [24-2] registered
[    0.189566] acpiphp: Slot [25-2] registered
[    0.189587] acpiphp: Slot [26-2] registered
[    0.189607] acpiphp: Slot [27-2] registered
[    0.189630] acpiphp: Slot [28-2] registered
[    0.189651] acpiphp: Slot [29-2] registered
[    0.189673] acpiphp: Slot [30-2] registered
[    0.189694] acpiphp: Slot [31-2] registered
[    0.189829] pci 0000:00:02.0: PCI bridge to [bus 01]
[    0.189837] pci 0000:00:02.0:   bridge window [io  0xc000-0xcfff]
[    0.189847] pci 0000:00:02.0:   bridge window [mem 0xfe400000-0xfe5fffff]
[    0.189879] pci 0000:00:02.0:   bridge window [mem 0xfe800000-0xfe9fffff 64bit pref]
[    0.190702] pci_bus 0000:00: on NUMA node 0
[    0.190971] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[    0.191044] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.191101] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.191158] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    0.191190] ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
[    0.194790] vgaarb: loaded
[    0.194852] SCSI subsystem initialized
[    0.194867] PCI: Using ACPI for IRQ routing
[    0.194868] PCI: pci_cache_line_size set to 64 bytes
[    0.194964] e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff]
[    0.194965] e820: reserve RAM buffer [mem 0x7ffde000-0x7fffffff]
[    0.195228] clocksource: Switched to clocksource kvm-clock
[    0.195228] pnp: PnP ACPI init
[    0.195228] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active)
[    0.195228] pnp 00:01: Plug and Play ACPI device, IDs PNP0303 (active)
[    0.195228] pnp 00:02: Plug and Play ACPI device, IDs PNP0f13 (active)
[    0.195228] pnp 00:03: [dma 2]
[    0.195228] pnp 00:03: Plug and Play ACPI device, IDs PNP0700 (active)
[    0.195473] pnp: PnP ACPI: found 4 devices
[    0.201874] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    0.202503] pci 0000:00:02.0: PCI bridge to [bus 01]
[    0.202509] pci 0000:00:02.0:   bridge window [io  0xc000-0xcfff]
[    0.203273] pci 0000:00:02.0:   bridge window [mem 0xfe400000-0xfe5fffff]
[    0.203735] pci 0000:00:02.0:   bridge window [mem 0xfe800000-0xfe9fffff 64bit pref]
[    0.204642] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7 window]
[    0.204643] pci_bus 0000:00: resource 5 [io  0x0d00-0xffff window]
[    0.204644] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
[    0.204645] pci_bus 0000:00: resource 7 [mem 0x80000000-0xfebfffff window]
[    0.204645] pci_bus 0000:00: resource 8 [mem 0x61c0000000-0x623fffffff window]
[    0.204646] pci_bus 0000:01: resource 0 [io  0xc000-0xcfff]
[    0.204647] pci_bus 0000:01: resource 1 [mem 0xfe400000-0xfe5fffff]
[    0.204648] pci_bus 0000:01: resource 2 [mem 0xfe800000-0xfe9fffff 64bit pref]
[    0.204789] NET: Registered protocol family 2
[    0.205119] tcp_listen_portaddr_hash hash table entries: 1024 (order: 2, 16384 bytes)
[    0.205132] TCP established hash table entries: 16384 (order: 5, 131072 bytes)
[    0.205153] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[    0.205184] TCP: Hash tables configured (established 16384 bind 16384)
[    0.205268] UDP hash table entries: 1024 (order: 3, 32768 bytes)
[    0.205274] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
[    0.205965] NET: Registered protocol family 1
[    0.206387] RPC: Registered named UNIX socket transport module.
[    0.206387] RPC: Registered udp transport module.
[    0.206387] RPC: Registered tcp transport module.
[    0.206388] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.206402] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.206409] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.206416] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.206458] PCI: CLS 0 bytes, default 64
[    0.206565] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2113352c5e9, max_idle_ns: 440795314412 ns
[    0.208899] workingset: timestamp_bits=46 max_order=19 bucket_order=0
[    0.211102] NFS: Registering the id_resolver key type
[    0.211110] Key type id_resolver registered
[    0.211110] Key type id_legacy registered
[    0.211112] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[    0.211113] nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver Registering...
[    0.211114] Installing knfsd (copyright (C) 1996 [email protected]).
[    0.211357] fuse init (API version 7.27)
[    0.211450] SGI XFS with security attributes, no debug enabled
[    0.212122] 9p: Installing v9fs 9p2000 file system support
[    0.212217] NET: Registered protocol family 38
[    0.212222] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
[    0.212324] io scheduler noop registered (default)
[    0.212724] shpchp 0000:00:02.0: Requesting control of SHPC hotplug via OSHP (\_SB_.PCI0.S10_)
[    0.212728] shpchp 0000:00:02.0: Requesting control of SHPC hotplug via OSHP (\_SB_.PCI0)
[    0.212730] shpchp 0000:00:02.0: Cannot get control of SHPC hotplug
[    0.212734] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[    0.212792] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    0.212795] ACPI: Power Button [PWRF]
[    0.232029] PCI Interrupt Link [LNKC] enabled at IRQ 11
[    0.248427] PCI Interrupt Link [LNKD] enabled at IRQ 10
[    0.264145] PCI Interrupt Link [LNKA] enabled at IRQ 10
[    0.280288] PCI Interrupt Link [LNKB] enabled at IRQ 11
[    0.301098] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    0.308698] random: fast init done
[    0.308923] random: crng init done
[    0.309876] console [hvc0] enabled
[    0.313200] brd: module loaded
[    0.314992] loop: module loaded
[    0.326251]  pmem0: p1
[    0.326359] pmem0: detected capacity change from 0 to 266338304
[    0.327255] scsi host0: Virtio SCSI HBA
[    0.461724] xt_time: kernel timezone is -0000
[    0.461747] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
[    0.461837] IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
[    0.461907] IPVS: ipvs loaded.
[    0.461909] IPVS: [rr] scheduler registered.
[    0.461910] IPVS: [wrr] scheduler registered.
[    0.461911] IPVS: [lc] scheduler registered.
[    0.461912] IPVS: [wlc] scheduler registered.
[    0.461912] IPVS: [fo] scheduler registered.
[    0.461913] IPVS: [ovf] scheduler registered.
[    0.461917] IPVS: [lblc] scheduler registered.
[    0.461919] IPVS: [lblcr] scheduler registered.
[    0.461920] IPVS: [dh] scheduler registered.
[    0.461921] IPVS: [sh] scheduler registered.
[    0.461922] IPVS: [sed] scheduler registered.
[    0.461923] IPVS: [nq] scheduler registered.
[    0.461925] IPVS: ftp: loaded support on port[0] = 21
[    0.461926] IPVS: [sip] pe registered.
[    0.462347] ipt_CLUSTERIP: ClusterIP Version 0.8 loaded successfully
[    0.462432] Initializing XFRM netlink socket
[    0.462601] NET: Registered protocol family 10
[    0.463966] Segment Routing with IPv6
[    0.464021] NET: Registered protocol family 17
[    0.464071] Bridge firewalling registered
[    0.464119] 9pnet: Installing 9P2000 support
[    0.465451] NET: Registered protocol family 40
[    0.465603] sched_clock: Marking stable (463488623, 1928956)->(475086223, -9668644)
[    0.466164] EXT4-fs (pmem0p1): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[    0.466738] EXT4-fs (pmem0p1): mounted filesystem with ordered data mode. Opts: dax,data=ordered,errors=remount-ro
[    0.466753] VFS: Mounted root (ext4 filesystem) readonly on device 259:1.
[    0.466834] devtmpfs: mounted
[    0.467794] Freeing unused kernel image memory: 872K
[    0.486792] Write protecting the kernel read-only data: 14336k
[    0.489711] Freeing unused kernel image memory: 2012K
[    0.489822] Freeing unused kernel image memory: 4K
[    0.489824] Run /sbin/init as init process
[    0.514437] systemd[1]: systemd 246 running in system mode. (+PAM +AUDIT -SELINUX +IMA -APPARMOR -SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 -IDN -PCRE2 default-hierarchy=hybrid)
[    0.514459] systemd[1]: Detected virtualization kvm.
[    0.514476] systemd[1]: Detected architecture x86-64.
[    0.516442] systemd[1]: No hostname configured.
[    0.516454] systemd[1]: Set hostname to <localhost>.
[    0.516520] systemd[1]: Initializing machine ID from random generator.
[    0.516547] systemd[1]: Installed transient /etc/machine-id file.
[    0.555250] systemd[1]: Queued start job for default target Kata Containers Agent Target.
[    0.555545] systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
[    0.555838] systemd[1]: Set up automount Arbitrary Executable File Formats File System Automount Point.
[    0.555859] systemd[1]: Reached target Local Encrypted Volumes.
[    0.555874] systemd[1]: Reached target Paths.
[    0.555886] systemd[1]: Reached target Slices.
[    0.555897] systemd[1]: Reached target Sockets.
[    0.556068] systemd[1]: Listening on Process Core Dump Socket.
[    0.556139] systemd[1]: Condition check resulted in Kernel Trace File System being skipped.
[    0.556157] systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.
[    0.556176] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
[    0.556269] systemd[1]: Condition check resulted in Load Kernel Modules being skipped.
[    0.557082] systemd[1]: Starting Remount Root and Kernel File Systems...
[    0.557114] systemd[1]: Condition check resulted in Repartition Root Disk being skipped.
[    0.557662] systemd[1]: Starting Apply Kernel Variables...
[    0.558380] systemd[82]: systemd-remount-fs.service: Failed to connect stdout to the journal socket, ignoring: No such file or directory
[    0.559052] systemd[83]: systemd-sysctl.service: Failed to connect stdout to the journal socket, ignoring: No such file or directory
[    0.567737] systemd[1]: Finished Remount Root and Kernel File Systems.
[    0.567831] systemd[1]: Reached target Local File Systems (Pre).
[    0.567846] systemd[1]: Condition check resulted in Default Swap File (/var/swapfile) being skipped.
[    0.567859] systemd[1]: Reached target Swap.
[    0.568519] systemd[1]: Mounting Temporary Directory (/tmp)...
[    0.568560] systemd[1]: Condition check resulted in Flush Journal to Persistent Storage being skipped.
[    0.569464] systemd[84]: tmp.mount: Failed to connect stdout to the journal socket, ignoring: No such file or directory
[    0.570205] systemd[1]: Finished Apply Kernel Variables.
[    0.573466] systemd[1]: Mounted Temporary Directory (/tmp).
[    0.573527] systemd[1]: Reached target Local File Systems.
[    0.573561] systemd[1]: Condition check resulted in Store a System Token in an EFI Variable being skipped.
[    0.573584] systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
[    0.573597] systemd[1]: Reached target System Initialization.
[    0.573613] systemd[1]: Started Daily Cleanup of Temporary Directories.
[    0.573622] systemd[1]: Reached target Basic System.
[    0.573655] systemd[1]: Reached target Timers.
[    0.573712] systemd[1]: Condition check resulted in NTP client/server being skipped.
[    0.574405] systemd[1]: Started Kata Containers Agent.
[    0.574454] systemd[1]: Reached target Kata Containers Agent Target.
[    0.574510] systemd[1]: Startup finished in 505ms (kernel) + 66ms (userspace) = 572ms.
[    0.621102] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[    0.697110] pci 0000:00:02.0: PCI bridge to [bus 01]
[    0.697134] pci 0000:00:02.0:   bridge window [io  0xc000-0xcfff]
[    0.700015] pci 0000:00:02.0:   bridge window [mem 0xfe400000-0xfe5fffff]
[    0.702188] pci 0000:00:02.0:   bridge window [mem 0xfe800000-0xfe9fffff 64bit pref]
[    0.706060] scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[    0.706457] sd 0:0:0:0: Power-on or device reset occurred
[    0.707329] sd 0:0:0:0: [sda] 209715200 512-byte logical blocks: (107 GB/100 GiB)
[    0.707398] sd 0:0:0:0: [sda] Write Protect is off
[    0.707401] sd 0:0:0:0: [sda] Mode Sense: 63 00 00 08
[    0.707511] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    0.710078] sd 0:0:0:0: [sda] Attached SCSI disk
[    0.713861] XFS (sda): Mounting V5 Filesystem
[    0.731505] XFS (sda): Ending clean mount
[    1.184965] Built 1 zonelists, mobility grouping on.  Total pages: 538430
[    1.486809] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    1.901504] systemd[1]: run-libcontainer-893d9d0a06fae4ac4f5db77113a87da6a9a2f5d85b54eb01dd63f88accd8986c-893d9d0a06fae4ac4f5db77113a87da6a9a2f5d85b54eb01dd63f88accd8986c-runc.zZo6FW.mount: Succeeded.
[   40.806235] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
[  126.335171] docker0: port 1(vethff13460) entered blocking state
[  126.335173] docker0: port 1(vethff13460) entered disabled state
[  126.335226] device vethff13460 entered promiscuous mode
[  126.335303] IPv6: ADDRCONF(NETDEV_UP): vethff13460: link is not ready
[  126.335304] docker0: port 1(vethff13460) entered blocking state
[  126.335306] docker0: port 1(vethff13460) entered forwarding state
[  126.335672] docker0: port 1(vethff13460) entered disabled state
[  126.497939] IPVS: ftp: loaded support on port[0] = 21
[  126.611935] eth0: renamed from veth9cc4265
[  126.612593] IPv6: ADDRCONF(NETDEV_CHANGE): vethff13460: link becomes ready
[  126.612672] docker0: port 1(vethff13460) entered blocking state
[  126.612676] docker0: port 1(vethff13460) entered forwarding state
[  126.612744] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
The command "sudo dmesg" exited with 0.

Hi @Rohanjos97 Regarding to #635 (comment), I am looking for the system dump which I think original failure could not reproduce,

@r30shah can you please have a look at these logs and let me if these are helpful? It seems I was looking at the wrong place earlier when I posted that dump, and let me know if you need anything else.

Thanks!

@ymanton
Copy link
Contributor

ymanton commented Sep 14, 2024

Also, since we are talking about travis, I think I should mention that the travis is not s390x node, but a amd64 node.

Ok, that's a pretty important detail. If you're using QEMU or some other emulator you should check its logs for errors.

@Rohanjos97
Copy link
Author

Hi Guys, I had shared the output of dmesg command that I ran on travis node (instead of running inside the container), please let me know if these logs are of some help...

Meanwhile I'll try to fetch the QEMU logs as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants