Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scala code compiled differently between OpenJDK11 and Semeru Openj9, causing VerifyError #14054

Closed
Sidong-Wei opened this issue Dec 1, 2021 · 37 comments
Assignees
Milestone

Comments

@Sidong-Wei
Copy link

Java -version output

openjdk version "11.0.13" 2021-10-19
IBM Semeru Runtime Open Edition 11.0.13.0 (build 11.0.13+8)
Eclipse OpenJ9 VM 11.0.13.0 (build openj9-0.29.0, JRE 11 Linux s390x-64-Bit Compressed References 20211021_227 (JIT enabled, AOT enabled)
OpenJ9 - e1e72c4
OMR - 299b6a2d2
JCL - 2d83aa3b76 based on jdk-11.0.13+8)

Summary of problem

I build Apache Kafka project version 3.0.0 with Semeru JDK 11.0.13 on a s390x machine, and when I started the service I found the following Verify error:

java.lang.VerifyError: JVMVRFY021 thrown object not throwable; class=kafka/log/AbstractIndex, method=<init>(Ljava/io/File;JIZ)V, pc=274
Exception Details:
  Location:
    kafka/log/AbstractIndex.<init>(Ljava/io/File;JIZ)V @274: JBathrow
  Reason:
    Type 'kafka/log/AbstractIndex' (current frame, stack[0]) is not assignable to 'java/lang/Throwable'
  Current Frame:
    bci: @274
    flags: { }
    locals: { 'kafka/log/AbstractIndex', 'java/io/File', long, long_2nd, integer, integer, integer, 'java/io/RandomAccessFile' }
    stack: { 'kafka/log/AbstractIndex' }
  Exception Handler Table:
    bci [85, 249] => handler: 249
  Stackmap Table:
    full_frame(@69,{Object[#1],Object[#60],long,integer,integer,integer},{Object[#1]})
    full_frame(@83,{Object[#1],Object[#60],long,integer,integer,integer},{Object[#1],Object[#144]})
    full_frame(@131,{Object[#1],Object[#60],long,integer,integer,integer,Object[#144]},{Object[#1]})
    same_locals_1_stack_item_frame(@151,Object[#1])
    same_locals_1_stack_item_frame(@184,Object[#1])
    full_frame(@200,{Object[#1],Object[#60],long,integer,integer,integer,Object[#144]},{Object[#1],Object[#154]})
    full_frame(@217,{Object[#1],Object[#60],long,integer,integer,integer,Object[#144],Object[#154]},{Object[#1]})
    same_locals_1_stack_item_frame(@244,Object[#1])
    full_frame(@249,{Object[#1],Object[#60],long,integer,integer,integer,Object[#144]},{Object[#27]})
    full_frame(@275,{Object[#1],Object[#60],long,integer,integer,integer,Object[#144],Object[#154]},{Object[#1],Object[#154]})
	at kafka.log.LazyIndex$.$anonfun$forOffset$1(LazyIndex.scala:106)
	at kafka.log.LazyIndex$$$Lambda$1053/0x0000000000000000.apply(Unknown Source)
	at kafka.log.LazyIndex.$anonfun$get$1(LazyIndex.scala:63)
	at kafka.log.LazyIndex.get(LazyIndex.scala:60)
	at kafka.log.LogSegment.offsetIndex(LogSegment.scala:64)
	at kafka.log.LogSegment.readNextOffset(LogSegment.scala:456)
	at kafka.log.Log.$anonfun$recoverLog$6(Log.scala:921)
	at kafka.log.Log$$Lambda$1055/0x0000000000000000.apply$mcJ$sp(Unknown Source)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
	at scala.Option.getOrElse(Option.scala:201)
	at kafka.log.Log.recoverLog(Log.scala:921)
	at kafka.log.Log.$anonfun$loadSegments$3(Log.scala:801)
	at kafka.log.Log$$Lambda$1050/0x0000000000000000.apply$mcJ$sp(Unknown Source)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
	at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2465)
	at kafka.log.Log.loadSegments(Log.scala:801)
	at kafka.log.Log.<init>(Log.scala:328)
	at kafka.log.Log$.apply(Log.scala:2601)
	at kafka.log.LogManager.$anonfun$getOrCreateLog$1(LogManager.scala:830)
	at kafka.log.LogManager$$Lambda$1022/0x0000000000000000.apply(Unknown Source)
	at scala.Option.getOrElse(Option.scala:201)
	at kafka.log.LogManager.getOrCreateLog(LogManager.scala:783)
	at kafka.cluster.Partition.createLog(Partition.scala:344)
	at kafka.cluster.Partition.createLogIfNotExists(Partition.scala:324)
	at kafka.cluster.Partition.$anonfun$makeLeader$1(Partition.scala:564)
	at kafka.cluster.Partition.makeLeader(Partition.scala:548)
	at kafka.server.ReplicaManager.$anonfun$makeLeaders$5(ReplicaManager.scala:1568)
	at kafka.server.ReplicaManager$$Lambda$1013/0x0000000000000000.apply(Unknown Source)
	at kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62)
	at kafka.server.ReplicaManager$$Lambda$1014/0x0000000000000000.apply(Unknown Source)
	at scala.collection.mutable.HashMap$Node.foreachEntry(HashMap.scala:633)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:499)
	at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:1566)
	at kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:1411)
	at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:258)
	at kafka.server.KafkaApis.handle(KafkaApis.scala:171)
	at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:74)
	at java.base/java.lang.Thread.run(Thread.java:836)

The same problem is not observed with Temurin JDK 11.0.13 or with Semeru JDK 11.0.13 on a amd64 machine.

Diagnostic files

As the error message suggested, I examine the compiled Java bytecode on different environment. I found that the issue lies within this source file. The compiled code of this file is different among different JDKs and archs while other files are consistent. Here is part of the difference:

Semeru JDK 11.0.13 on s390x:

  public kafka.log.AbstractIndex(java.io.File, long, int, boolean);
    descriptor: (Ljava/io/File;JIZ)V
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=7, locals=12, args_size=5
         0: aload_0
         1: aload_1
         2: putfield      #104                // Field _file:Ljava/io/File;
...
       240: invokevirtual #566                // Method java/nio/MappedByteBuffer.position:(I)Ljava/nio/Buffer;
       243: pop
       244: aload         8
       246: goto          275
       249: pop
       250: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       253: aload         7
       255: invokedynamic #643,  0            // InvokeDynamic #1:apply$mcV$sp:(Ljava/io/RandomAccessFile;)Lscala/runtime/java8/JFunction0$mcV$sp;
       260: getstatic     #51                 // Field kafka/log/AbstractIndex$.MODULE$:Lkafka/log/AbstractIndex$;
       263: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       266: pop
       267: getstatic     #590                // Field org/slf4j/event/Level.WARN:Lorg/slf4j/event/Level;
       270: invokevirtual #594                // Method kafka/utils/CoreUtils$.swallow:(Lscala/Function0;Lkafka/utils/Logging;Lorg/slf4j/event/Level;)V
       273: aload_0
       274: athrow
       275: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       278: aload         7
       280: invokedynamic #643,  0            // InvokeDynamic #1:apply$mcV$sp:(Ljava/io/RandomAccessFile;)Lscala/runtime/java8/JFunction0$mcV$sp;
       285: getstatic     #51                 // Field kafka/log/AbstractIndex$.MODULE$:Lkafka/log/AbstractIndex$;
       288: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       291: pop
       292: getstatic     #590                // Field org/slf4j/event/Level.WARN:Lorg/slf4j/event/Level;
       295: invokevirtual #594                // Method kafka/utils/CoreUtils$.swallow:(Lscala/Function0;Lkafka/utils/Logging;Lorg/slf4j/event/Level;)V
       298: putfield      #196                // Field mmap:Ljava/nio/MappedByteBuffer;
       301: aload_0
       302: aload_0
       303: invokevirtual #364                // Method mmap:()Ljava/nio/MappedByteBuffer;
       306: invokevirtual #538                // Method java/nio/MappedByteBuffer.limit:()I
       309: aload_0
       310: invokevirtual #190                // Method entrySize:()I
       313: idiv
       314: putfield      #207                // Field _maxEntries:I
       317: aload_0
       318: aload_0
       319: invokevirtual #364                // Method mmap:()Ljava/nio/MappedByteBuffer;
...
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
          202      44     8   idx   Ljava/nio/MappedByteBuffer;
           47     251     6 newlyCreated   Z
           85     213     7   raf   Ljava/io/RandomAccessFile;
            0     334     0  this   Lkafka/log/AbstractIndex;
            0     334     1 _file   Ljava/io/File;
            0     334     2 baseOffset   J
            0     334     4 maxIndexSize   I
            0     334     5 writable   Z
          139       8     9 roundDownToExactMultiple_factor   I
          232       8    10 roundDownToExactMultiple_number   I
          232       8    11 roundDownToExactMultiple_factor   I
    MethodParameters:
      Name                           Flags
      _file                          final
      baseOffset                     final
      maxIndexSize                   final
      writable                       final

Attaching the full text here: semeru_disassembled.txt

Temurin JDK 11.0.13 on s390x (Semeru JDK 11.0.13 on amd64 is similar):

  public kafka.log.AbstractIndex(java.io.File, long, int, boolean);
    descriptor: (Ljava/io/File;JIZ)V
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=7, locals=13, args_size=5
         0: aload_0
         1: aload_1
...
       240: invokevirtual #566                // Method java/nio/MappedByteBuffer.position:(I)Ljava/nio/Buffer;
       243: pop
       244: aload         8
       246: goto          277
       249: astore        9
       251: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       254: aload         7
       256: invokedynamic #643,  0            // InvokeDynamic #1:apply$mcV$sp:(Ljava/io/RandomAccessFile;)Lscala/runtime/java8/JFunction0$mcV$sp;
       261: getstatic     #51                 // Field kafka/log/AbstractIndex$.MODULE$:Lkafka/log/AbstractIndex$;
       264: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       267: pop
       268: getstatic     #590                // Field org/slf4j/event/Level.WARN:Lorg/slf4j/event/Level;
       271: invokevirtual #594                // Method kafka/utils/CoreUtils$.swallow:(Lscala/Function0;Lkafka/utils/Logging;Lorg/slf4j/event/Level;)V
       274: aload         9
       276: athrow
       277: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       280: aload         7
       282: invokedynamic #643,  0            // InvokeDynamic #1:apply$mcV$sp:(Ljava/io/RandomAccessFile;)Lscala/runtime/java8/JFunction0$mcV$sp;
       287: getstatic     #51                 // Field kafka/log/AbstractIndex$.MODULE$:Lkafka/log/AbstractIndex$;
       290: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       293: pop
       294: getstatic     #590                // Field org/slf4j/event/Level.WARN:Lorg/slf4j/event/Level;
       297: invokevirtual #594                // Method kafka/utils/CoreUtils$.swallow:(Lscala/Function0;Lkafka/utils/Logging;Lorg/slf4j/event/Level;)V
       300: putfield      #196                // Field mmap:Ljava/nio/MappedByteBuffer;
       303: aload_0
       304: aload_0
       305: invokevirtual #364                // Method mmap:()Ljava/nio/MappedByteBuffer;
       308: invokevirtual #538                // Method java/nio/MappedByteBuffer.limit:()I
       311: aload_0
       312: invokevirtual #190                // Method entrySize:()I
       315: idiv
       316: putfield      #207                // Field _maxEntries:I
       319: aload_0
       320: aload_0
       321: invokevirtual #364                // Method mmap:()Ljava/nio/MappedByteBuffer;
...
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
          202      44     8   idx   Ljava/nio/MappedByteBuffer;
           47     253     6 newlyCreated   Z
           85     215     7   raf   Ljava/io/RandomAccessFile;
            0     336     0  this   Lkafka/log/AbstractIndex;
            0     336     1 _file   Ljava/io/File;
            0     336     2 baseOffset   J
            0     336     4 maxIndexSize   I
            0     336     5 writable   Z
          139       8    10 roundDownToExactMultiple_factor   I
          232       8    11 roundDownToExactMultiple_number   I
          232       8    12 roundDownToExactMultiple_factor   I
    MethodParameters:
      Name                           Flags
      _file                          final
      baseOffset                     final
      maxIndexSize                   final
      writable                       final

Attaching the full text here: temurin_disassembled.txt

From the comparison, we could tell that the local variable no. 9 on Temurin machine is missing on Semeru machine. There are two operations involving this variable 249: astore 9 and 274: aload 9, whereas the latter one was mistakenly compiled to 273: aload_0 on Semeru s390x machine, which leading to the next operation 274: athrow to throw the class object itself (while it actually is supposed to throw local variable 9). This assumption matches what the error message is saying (kafka/log/AbstractIndex is not throwable as it was not supposed to be thrown).

I compared the source code with the compiled java bytecode, and I believe the snippet above is compiled from the following source file block (line 109 - 138 from file):

  protected var mmap: MappedByteBuffer = {
    val newlyCreated = file.createNewFile()
    val raf = if (writable) new RandomAccessFile(file, "rw") else new RandomAccessFile(file, "r")
    try {
      /* pre-allocate the file if necessary */
      if(newlyCreated) {
        if(maxIndexSize < entrySize)
          throw new IllegalArgumentException("Invalid max index size: " + maxIndexSize)
        raf.setLength(roundDownToExactMultiple(maxIndexSize, entrySize))
      }

      /* memory-map the file */
      _length = raf.length()
      val idx = {
        if (writable)
          raf.getChannel.map(FileChannel.MapMode.READ_WRITE, 0, _length)
        else
          raf.getChannel.map(FileChannel.MapMode.READ_ONLY, 0, _length)
      }
      /* set the position in the index for the next entry */
      if(newlyCreated)
        idx.position(0)
      else
        // if this is a pre-existing index, assume it is valid and set position to last entry
        idx.position(roundDownToExactMultiple(idx.limit(), entrySize))
      idx
    } finally {
      CoreUtils.swallow(raf.close(), AbstractIndex)
    }
  }

Apparently, there are some bugs when compiling the code block above with Semeru on s390x, while the bugs are not present when compile on AMD64 or using Temurin JDK. The most obvious sign of the bug is the missing 9 local variable, which unfortunately is not shown on the local variable table as well. I am guessing the 9 variable may relate to the try block as it is supposed to be thrown at a certain point, but I cannot get further information on that, and I also could not get a simpler reproduction test case as the issue seems to be pretty subtle and tricky.

Another thing that might help with debugging this issue is the scala compilation options. In the environment above, I have been using Kafka's default scala compilation flags, which is:

2021-12-01T14:13:43.183+0000 [DEBUG] [org.gradle.api.internal.tasks.scala.ZincScalaCompilerFactory] [zinc] The Scala compiler is invoked with:
	-deprecation
	-unchecked
	-deprecation
	-unchecked
	-encoding
	utf8
	-Xlog-reflective-calls
	-feature
	-language:postfixOps
	-language:implicitConversions
	-language:existentials
	-Xlint:constant
	-Xlint:delayedinit-select
	-Xlint:doc-detached
	-Xlint:missing-interpolator
	-Xlint:nullary-unit
	-Xlint:option-implicit
	-Xlint:package-object-classes
	-Xlint:poly-implicit-overload
	-Xlint:private-shadow
	-Xlint:stars-align
	-Xlint:type-parameter-shadow
	-Xlint:unused
	-opt:l:inline
	-opt-inline-from:org.apache.kafka.**
	-opt-inline-from:kafka.**
	-opt-warnings
	-Xlint:strict-unsealed-patmat
	-Xfatal-warnings
	-release
	8
	-bootclasspath
	/home/sidong/.gradle/caches/modules-2/files-2.1/org.scala-lang/scala-library/2.13.6/ed7a2f528c7389ea65746c22a01031613d98ab3d/scala-library-2.13.6.jar
	-classpath
	/home/sidong/src/kafka/server-common/build/classes/java/main:/home/sidong/src/kafka/clients/build/classes/java/main:/home/sidong/.gradle/caches/modules-2/files-2.1/com.fasterxml.jackson.dataformat/jackson-dataformat-csv/2.12.3/eeaa685648b9eec8e31e74e076bc822d719ff63f/jackson-dataformat-csv-2.12.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.fasterxml.jackson.datatype/jackson-datatype-jdk8/2.12.3/77424ea087313312e308dae5ff8445608aabb5e1/jackson-datatype-jdk8-2.12.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.fasterxml.jackson.core/jackson-databind/2.12.3/d6153f8fc60c479ab0f9efb35c034526436a4953/jackson-databind-2.12.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.fasterxml.jackson.core/jackson-annotations/2.12.3/7275513412694a1aafd08c0287f48469fa0e6e17/jackson-annotations-2.12.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.fasterxml.jackson.core/jackson-core/2.12.3/deb23fe2a7f2b773e18ced2b50d4acc1df8fa366/jackson-core-2.12.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.fasterxml.jackson.module/jackson-module-scala_2.13/2.12.3/59f934e4fa34ad07afae4e80e9057c89bea9b52c/jackson-module-scala_2.13-2.12.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.scala-lang.modules/scala-collection-compat_2.13/2.4.4/9cb262981ba1fac1f25c0e7e2b285d536ec8923b/scala-collection-compat_2.13-2.4.4.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.scala-lang.modules/scala-java8-compat_2.13/1.0.0/8ffac68615b438933fe27506e755d790a68b8bab/scala-java8-compat_2.13-1.0.0.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.typesafe.scala-logging/scala-logging_2.13/3.9.3/d97c3bcdcf6179e110af8ad2a64ca276843395c1/scala-logging_2.13-3.9.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.scala-lang/scala-reflect/2.13.6/bb523e56c63746a5752dece28fcd702c54fd3a11/scala-reflect-2.13.6.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.scala-lang/scala-library/2.13.6/ed7a2f528c7389ea65746c22a01031613d98ab3d/scala-library-2.13.6.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/log4j/log4j/1.2.17/5af35056b4d257e4b64b9e8069c0746e8b08629f/log4j-1.2.17.jar:/home/sidong/src/kafka/metadata/build/classes/java/main:/home/sidong/src/kafka/raft/build/classes/java/main:/home/sidong/src/kafka/storage/build/classes/java/main:/home/sidong/.gradle/caches/modules-2/files-2.1/net.sourceforge.argparse4j/argparse4j/0.7.0/6f0621d0c3888de39e0f06d01f37ba53a798e657/argparse4j-0.7.0.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/net.sf.jopt-simple/jopt-simple/5.0.4/4fdac2fbe92dfad86aa6e9301736f6b4342a3f5c/jopt-simple-5.0.4.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.yammer.metrics/metrics-core/2.2.0/f82c035cfa786d3cbec362c38c22a5f5b1bc8724/metrics-core-2.2.0.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.dropwizard.metrics/metrics-core/4.1.12.1/cb2f351bf4463751201f43bb99865235d5ba07ca/metrics-core-4.1.12.1.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.apache.zookeeper/zookeeper/3.6.3/a6e74f826db85ff8c51c15ef0fa2ea0b462aef25/zookeeper-3.6.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-api/1.7.30/b5a4b6d16ab13e34a88fae84c35cd5d68cac922c/slf4j-api-1.7.30.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/commons-cli/commons-cli/1.4/c51c00206bb913cd8612b24abd9fa98ae89719b1/commons-cli-1.4.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.thoughtworks.paranamer/paranamer/2.8/619eba74c19ccf1da8ebec97a2d7f8ba05773dd6/paranamer-2.8.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.apache.zookeeper/zookeeper-jute/3.6.3/8990d19ec3db01f45f82d4011a11b085db66de05/zookeeper-jute-3.6.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.apache.yetus/audience-annotations/0.5.0/55762d3191a8d6610ef46d11e8cb70c7667342a3/audience-annotations-0.5.0.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-handler/4.1.62.Final/78bdd07b607bf1e3f7d74702fd8e56fd4dceb9a6/netty-handler-4.1.62.Final.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-transport-native-epoll/4.1.62.Final/d5ad8ffa470d4579da6939f452ec2c85708974f2/netty-transport-native-epoll-4.1.62.Final.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-codec/4.1.62.Final/922a898d93002486fb20a82ac4fbca1e42b87ee0/netty-codec-4.1.62.Final.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-transport-native-unix-common/4.1.62.Final/d6bca9220b0594a3ad85ddc07e4aadf6562e78a2/netty-transport-native-unix-common-4.1.62.Final.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-transport/4.1.62.Final/e1c0d805e9b28dd6c4075eb8c08be650314f317/netty-transport-4.1.62.Final.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-resolver/4.1.62.Final/6f5ed2b21695e3d4c678e250799afe8e6a1dbe1b/netty-resolver-4.1.62.Final.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-buffer/4.1.62.Final/6b030204124386f3b96e563b173b3239d633356c/netty-buffer-4.1.62.Final.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-common/4.1.62.Final/56d494a54633ee63d49248a78d2a6dc83d0c4e0e/netty-common-4.1.62.Final.jar:/home/sidong/src/kafka/core/build/classes/java/main:/home/sidong/src/kafka/core/build/classes/scala/main

And I actually found that if I turn off the inliner optimization of Scala compiler, it will leads to different Java machine code but the issue will not persist on s390x with Semeru JDK. Please see this attached file:semeru_no_inliner_disassembled.txt
As mentioned, turning off inliner will lead to a different bytecode file but for this specific class kafka/log/AbstractIndex, the local variable 9 and the two corresponding operation will be correctly compiled, thus the VerifyError will not emerge. So it seems the bug only happens on s390x and while the inliner is turned on.

It will be really appreciated if someone could look into this error and find out why one local variable will be missing when compiling scala code with Semeru JDK11. Thank you.

@pshipton
Copy link
Member

pshipton commented Dec 2, 2021

@tajila

@tajila
Copy link
Contributor

tajila commented Dec 2, 2021

@ChengJin01 Please take a look at this

@ChengJin01
Copy link

ChengJin01 commented Dec 2, 2021

Just talked to @Sidong-Wei offline. The reason why the VerifyError was captured during runtime verification (which is correct as AbstractIndex is not throwable) is that the Scala scource was already mistakenly compiled to the wrong bytecode on s390x by the Gradle compiler, which is out of VM scope given the bytecode is statically generated in the .jar file before running the bytecode.
@Sidong-Wei might need to investigate why the bytecode wasn't compiled correctly as expected on s390x. e.g. whether there is any connection to the java compiler (javac) or something else specific to s390x.

@Sidong-Wei
Copy link
Author

Hi @ChengJin01 thanks a lot for your input, I think you made a good point in claiming the issue has nothing to do with the Verifier or JVM as the issue happens bytecode was compiled statically. I further looked into how the Kafka project use java compiler and scala compiler to compile its source code.
I found that it is using the gradle scala plugin, which use the zinc compiler to compile its hybrid java and scala code. I checked the source code of the compiler and tested with different JDKs, and I did not find any evidence that it has platform specific code or jvm specific code, which leads to the assumption that the issue I experienced may be still within the JDK scope (based on different bytecode on same machine with different JDK).
I understand the difference between Openj9 and Hotspot is mainly on the JVM side, but I believe there is still difference between these variants in terms of libraries and javac implementation. (i.e. I remember the ThreadMXBean.getThreadAllocatedBytes method from com.sun.management package is implemented in Hotspot but not in Openj9), so I wonder if there is someone in the openj9 community who coule help with this type of issue (with javac or shared library)? Since the issue is both platform-relevant and JDK-relevant, I suspect the issue may be with the specific javac implementation on s390x in openj9 repo, and any help to further narrow the issue down would be greatly appreciated, thanks.

@ChengJin01
Copy link

ChengJin01 commented Dec 7, 2021

@Sidong-Wei, the java compiler (javac) is implemented in OpenJDK which doesn't belong to OpenJ9.

The question is whether the zinc compiler depends on javac to do compilation. Even so, I still doubt the issue was triggered by javac. So it is likely there is intermediate source code generated by zinc prior to the bytecode, which is different across platforms.

@ChengJin01
Copy link

ChengJin01 commented Dec 9, 2021

I double-checked this project at https://github.com/apache/kafka/tree/3.0 and noticed that the kafka/core/src/main/scala/kafka/log/AbstractIndex.scala was compiled to kafka/core/build/classes/scala/main/kafka/log/AbstractIndex.class with the command ./gradlew core:compileScala wich launched at least 3 java processes during the building progress as follows:

[1] The background GradleDaemon:
jdk11_openj9_semeru/bin/java -Xss4m -XX:+UseParallelGC --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.lang.invoke=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED --add-opens java.base/java.nio.charset=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED -Xmx2g -Dfile.encoding=UTF-8 -Duser.country=US -Duser.language=en -Duser.variant -cp /home/jenkins/.gradle/wrapper/dists/gradle-7.1.1-all/1wqbeia0d49252rmlcr6o8lbl/gradle-7.1.1/lib/gradle-launcher-7.1.1.jar org.gradle.launcher.daemon.bootstrap.GradleDaemon 7.1.1

[2] The GradleWrapperMain (command line parser):
jdk11_openj9_semeru/bin/java -Xint -Xmx64m -Xms64m -Dorg.gradle.appname=gradlew -classpath /home/jenkins/jchau_ffi/14054_apache_kafka_jdk11/kafka/gradle/wrapper/gradle-wrapper.jar org.gradle.wrapper.GradleWrapperMain core:compileScala

[3] The Gradle Worker Daemon:
e.g.
jdk11_openj9_semeru/bin/java -Xss4m -XX:+UseParallelGC @/home/jenkins/.gradle/.tmp/gradle-worker-classpath2336945854620526368txt -Xmx2048m -Dfile.encoding=UTF-8 -Duser.country=US -Duser.language=en -Duser.variant worker.org.gradle.process.internal.worker.GradleWorkerMain 'Gradle Worker Daemon 1'

By modifying kafka/build.gradle to disable JIT on The Gradle Worker Daemon as follows:

  defaultJvmArgs = ["-Xint", "-Xss4m", "-XX:+UseParallelGC"]

the issue disappeared (tried many times) and ended up with the correct bytecode:

  public kafka.log.AbstractIndex(java.io.File, long, int, boolean);
    descriptor: (Ljava/io/File;JIZ)V
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=7, locals=13, args_size=5
...
       244: aload         8
       246: goto          277
       249: astore        9  <----- push the thrown vlaue to the local variable #9
       251: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       254: aload         7
       256: invokedynamic #643,  0            // InvokeDynamic #1:apply$mcV$sp:(Ljava/io/RandomAccessFile;)Lscala/runtime/java8/JFunction0$mcV$sp;
       261: getstatic     #51                 // Field kafka/log/AbstractIndex$.MODULE$:Lkafka/log/AbstractIndex$;
       264: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       267: pop
       268: getstatic     #590                // Field org/slf4j/event/Level.WARN:Lorg/slf4j/event/Level;
       271: invokevirtual #594                // Method kafka/utils/CoreUtils$.swallow:(Lscala/Function0;Lkafka/utils/Logging;Lorg/slf4j/event/Level;)V
       274: aload         9  <----- load the thrown vlaue from the local variable #9
       276: athrow
...
      Exception table:
         from    to  target type
            85   249   249   any

My understanding is that the scala source must be loaded/compiled directly to bytecode in memory and was most likely optimized in there by JIT before being outputted into the class file but still unclear where the optimization occurred.

So we need to get the JIT team involved in the investigation to see how & why this happened in this case.

FYI: @0xdaryl

@DanHeidinga
Copy link
Member

@0xdaryl can you update this for the 0.30 release?

@0xdaryl
Copy link
Contributor

0xdaryl commented Jan 12, 2022

JIT hasn't had a chance to look at the failure yet. Thanks @Sidong-Wei for your thorough investigation upon opening the PR. Given the apparent Z-specific nature of the problem, I'll ask @joransiu to assign for JIT investigation based on @ChengJin01's helpful analysis so far.

This seems unlikely to be resolved for 0.30.

@pshipton
Copy link
Member

I'll move it out.

@jonathan-albrecht-ibm
Copy link

jonathan-albrecht-ibm commented Jun 24, 2022

I'm wondering if I could get an update on where this issue stands? Is there a plan to implement it in a particular release?
I retested this against the current kafka 3.2.0 source release and this issue still occurs.
I tested with the IBM Semeru Runtime Certified Edition 11.0.15.0 (build 11.0.15+10)

@pshipton
Copy link
Member

@joransiu it seems the question is for you.

@joransiu
Copy link
Member

Apologies for the delay. I missed the original tag, so unfortuantely, no updates to report. Will have someone look into this from JIT perspective.

@VermaSh
Copy link
Contributor

VermaSh commented Jun 29, 2022

Hey, @Sidong-Wei I am going to be looking in to this failure. What commands did you use to setup and run?
Is this it?

curl -kn0OL https://archive.apache.org/dist/kafka/3.0.0/kafka-3.0.0-src.tgz
tar -xf kafka-3.0.0-src.tgz
cd kafka-3.0.0-src/

bin/zookeeper-server-start.sh config/zookeeper.properties

# In a separate console
bin/kafka-server-start.sh config/server.properties

@jonathan-albrecht-ibm
Copy link

Hi @VermaSh I can help you with this issue. You can reproduce it from the unit tests which may be easier.

It's actually easier to setup with kafka 3.2.0 now which has better support for s390x in the jar dependencies.

mkdir src
curl -kn0OL https://archive.apache.org/dist/kafka/3.2.0/kafka-3.2.0-src.tgz
cd src/
tar xf ../kafka-3.2.0-src.tgz
cd kafka-3.2.0-src/
# I'm using IBM Semeru Runtime Certified Edition 11.0.15.0 (build 11.0.15+10)
export JAVA_HOME=/opt/jdk-11.0.15+10
./gradlew jar
# At this point you should be able to run through the quickstart guide but I haven't tried it.
# Several of the tests in the core module will fail. This is one of them:
./gradlew :core:test --tests FetchRequestBetweenDifferentIbpTest.testControllerOldToNewIBP

you should see an error that starts with:

FetchRequestBetweenDifferentIbpTest > testControllerOldToNewIBP() FAILED                                                    java.lang.VerifyError: JVMVRFY021 thrown object not throwable; class=kafka/log/AbstractIndex, method=<init>(Ljava/io/File;JIZ)V, pc=274                                                                                                         Exception Details:                                                                                                        Location:
        kafka/log/AbstractIndex.<init>(Ljava/io/File;JIZ)V @274: JBathrow
      Reason:
        Type 'kafka/log/AbstractIndex' (current frame, stack[0]) is not assignable to 'java/lang/Throwable'
      Current Frame:
        bci: @274
        flags: { }
        locals: { 'kafka/log/AbstractIndex', 'java/io/File', long, long_2nd, integer, integer, integer, 'java/io/RandomAccessFile' }
        stack: { 'kafka/log/AbstractIndex' }
      Exception Handler Table:
        bci [85, 249] => handler: 249
      Stackmap Table:
...

Let me know if I can help

@jonathan-albrecht-ibm
Copy link

Hi @VermaSh, just wondering if there is any update on this issue?

@VermaSh
Copy link
Contributor

VermaSh commented Oct 18, 2022

Hi @jonathan-albrecht-ibm, Sorry I got swamped with some other work so don't have an update right now. But I'll try to have an update over next few days.

@VermaSh
Copy link
Contributor

VermaSh commented Oct 28, 2022

Quick update: Disabling asynchronous JIT compilation for gradle worker daemon fixes the failures. Doing a binary search now to get the failing JIT method.

@jonathan-albrecht-ibm
Copy link

Hi @VermaSh, just wanted to check if you were able to find the failing JIT method.

@VermaSh
Copy link
Contributor

VermaSh commented Feb 1, 2023

Hey @jonathan-albrecht-ibm It fails when scala/tools/asm/tree/analysis/Analyzer.analyze(Ljava/lang/String;Lscala/tools/asm/tree/MethodNode;)[Lscala/tools/asm/tree/analysis/Frame; is compiled at opt level hot. Limiting the method to opt level warm or below results in correct byte being generated for AbstractIndex : -Xjit:'{scala/tools/asm/tree/analysis/Analyzer.analyze(Ljava/lang/String*}(traceFull,optLevel=warm,log=trace.log)'

I am going through the method log now, so will have an other update for you soon.

@VermaSh
Copy link
Contributor

VermaSh commented Feb 21, 2023

I was able to narrow it down further to class hierarchy table. Running with disableCHOpts fixes the failures. Next step here is to figure out what exactly about ch opts causes the failure.

@jonathan-albrecht-ibm
Copy link

Hi @VermaSh, just wanted to check in and see if you had any new info on this issue.

@VermaSh
Copy link
Contributor

VermaSh commented Apr 3, 2023

Hi @jonathan-albrecht-ibm, unfortunately haven't had a chance to investigate further into the failure. I am hoping to have some free bandwidth later this week as things are cooling down for the other project.

@VermaSh
Copy link
Contributor

VermaSh commented May 1, 2023

Haven't have able to make much progress here as I had to investigate a blocker on zLinux. I'll post an update as soon as I make progress for this failure.

@jonathan-albrecht-ibm
Copy link

Hi @VermaSh, just wanted to check in and see if you had any new info on this issue.

@jonathan-albrecht-ibm
Copy link

Hi @VermaSh, just wanted to check in again on this issue

@VermaSh
Copy link
Contributor

VermaSh commented Jan 25, 2024

@jonathan-albrecht-ibm sorry, haven't had the bandwidth to look into this.

@r30shah
Copy link
Contributor

r30shah commented Jan 25, 2024

@Spencer-Comin Can you take a look at the issue given that is is seen with kafka ? For some reason, I can not assign this one to you.

@Spencer-Comin
Copy link
Contributor

RE: #14054 (comment)
I was able to reproduce this error with kafka 3.2.0 and semeru 11.0.21+9. As a side note, with the latest kafka (3.6.1) I do not see the error.

@jonathan-albrecht-ibm
Copy link

Hi @Spencer-Comin, just wanted to check if you are planning on working on this issue. It's currently unassigned.

@Spencer-Comin
Copy link
Contributor

@jonathan-albrecht-ibm I’m off for the next two weeks; once I’m back I’ll continue working on this issue.

@Spencer-Comin
Copy link
Contributor

I've poked around at this for the last few weeks. The issue disappears when I use Semeru 11.0.22+7. I've confirmed that the correct bytecode is produced.

      249: astore        9
      251: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
      254: aload         7
      256: invokedynamic #643,  0            // InvokeDynamic #1:apply$mcV$sp:(Ljava/io/RandomAccessFile;)Lscala/runtime/java8/JFunction0$mcV$sp;
      261: getstatic     #51                 // Field kafka/log/AbstractIndex$.MODULE$:Lkafka/log/AbstractIndex$;
      264: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
      267: pop
      268: getstatic     #590                // Field org/slf4j/event/Level.WARN:Lorg/slf4j/event/Level;
      271: invokevirtual #594                // Method kafka/utils/CoreUtils$.swallow:(Lscala/Function0;Lkafka/utils/Logging;Lorg/slf4j/event/Level;)V
      274: aload         9
      276: athrow

Full disassembly here: AbstractIndex.11.0.22+7.disassembly.txt

@jonathan-albrecht-ibm
Copy link

Thanks @Spencer-Comin that looks good to me. I'm okay to close this issue now.

@jonathan-albrecht-ibm
Copy link

@Spencer-Comin, just wanted to check if you would like to close this issue. I don't have permission to do that.

Copy link

Issue Number: 14054
Status: Closed
Actual Components: comp:jit, userRaised
Actual Assignees: No one :(
PR Assignees: No one :(

@Spencer-Comin Spencer-Comin self-assigned this Nov 27, 2024
@Spencer-Comin Spencer-Comin reopened this Nov 27, 2024
Copy link

Issue Number: 14054
Status: Closed
Actual Components: comp:jit, userRaised
Actual Assignees: No one :(
PR Assignees: No one :(

@Spencer-Comin
Copy link
Contributor

I thought assigning myself and reclosing the issue would let the bot know that I was the actual assignee, but apparently not

@pshipton
Copy link
Member

pshipton commented Nov 27, 2024

Feel free to ignore comments such as #14054 (comment). They are an AI experiment and mean nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests