Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCDFE in Liberty startup #20444

Open
hangshao0 opened this issue Oct 30, 2024 · 25 comments
Open

NCDFE in Liberty startup #20444

hangshao0 opened this issue Oct 30, 2024 · 25 comments

Comments

@hangshao0
Copy link
Contributor

Java -version output

1CIJAVAVERSION JRE 17 Windows 10 amd64-64 (build 17.0.12+7)
1CIVMVERSION   20240716_793
1CIJ9VMTAG     openj9-0.46.1
1CIJ9VMVERSION 4760d5d320
1CIJITVERSION  j9jit_20240910_1925_jenkins
1CIOMRVERSION  840a9adba_CMPRSS
1CIJCLVERSION  e70fd1fd64a based on jdk-17.0.12+7

Summary of problem

java.lang.NoClassDefFoundError thrown when loading class com.ibm.ws.classloading.internal.providers.Providers

Diagnostic files

java.lang.NoClassDefFoundError: com.ibm.ws.classloading.internal.providers.Providers
        at com.ibm.ws.classloading.internal.AppClassLoader.<init>(AppClassLoader.java:172)
        at com.ibm.ws.classloading.internal.ClassLoaderFactory.createClassLoader(ClassLoaderFactory.java:138)
        at com.ibm.ws.classloading.internal.ClassLoaderFactory.create(ClassLoaderFactory.java:116)
        at com.ibm.ws.classloading.internal.ClassLoadingServiceImpl.createAppClassLoader(ClassLoadingServiceImpl.java:349)
        at com.ibm.ws.classloading.internal.ClassLoadingServiceImpl.createTopLevelClassLoader(ClassLoadingServiceImpl.java:321)
        at com.ibm.ws.classloading.internal.ClassLoadingServiceImpl.createTopLevelClassLoader(ClassLoadingServiceImpl.java:103)
        at com.ibm.ws.classloading.ClassLoaderConfigHelper.createTopLevelClassLoader(ClassLoaderConfigHelper.java:314)
        at com.ibm.ws.app.manager.module.internal.DeployedAppInfoBase.createTopLevelClassLoader(DeployedAppInfoBase.java:635)
        at com.ibm.ws.app.manager.ear.internal.EARDeployedAppInfo.basicCreateAppClassLoader(EARDeployedAppInfo.java:231)
        at com.ibm.ws.app.manager.ear.internal.EARDeployedAppInfo.getAppClassLoader(EARDeployedAppInfo.java:180)
        at com.ibm.ws.app.manager.ear.internal.EARDeployedAppInfo.createEJBModuleClassLoader(EARDeployedAppInfo.java:1364)
        at com.ibm.ws.app.manager.ear.internal.EARDeployedAppInfo.createModuleClassLoader(EARDeployedAppInfo.java:1345)
        at com.ibm.ws.app.manager.module.internal.ExtendedModuleInfoImpl.getClassLoader(ExtendedModuleInfoImpl.java:124)
        at com.ibm.ws.ejbcontainer.osgi.internal.ModuleInitDataAdapter.createModuleInitData(ModuleInitDataAdapter.java:178)
        at com.ibm.ws.ejbcontainer.osgi.internal.ModuleInitDataAdapter.adapt(ModuleInitDataAdapter.java:124)
        at com.ibm.ws.ejbcontainer.osgi.internal.ModuleInitDataAdapter.adapt(ModuleInitDataAdapter.java:43)
        at com.ibm.ws.adaptable.module.internal.AdapterFactoryServiceImpl.adapt(AdapterFactoryServiceImpl.java:202)
        at com.ibm.ws.adaptable.module.internal.AdaptableContainerImpl.adapt(AdaptableContainerImpl.java:176)
        at com.ibm.ws.adaptable.module.internal.InterpretedContainerImpl.adapt(InterpretedContainerImpl.java:205)
        at com.ibm.ws.ejbcontainer.osgi.internal.EJBModuleRuntimeContainerImpl.createModuleMetaData(EJBModuleRuntimeContainerImpl.java:105)
        at com.ibm.ws.app.manager.module.internal.ModuleHandlerBase.createModuleMetaData(ModuleHandlerBase.java:65)
        at com.ibm.ws.app.manager.module.internal.SimpleDeployedAppInfoBase$ModuleContainerInfoBase.createModuleMetaData(SimpleDeployedAppInfoBase.java:222)
        at com.ibm.ws.app.manager.module.internal.SimpleDeployedAppInfoBase.preDeployApp(SimpleDeployedAppInfoBase.java:608)
        at com.ibm.ws.app.manager.module.internal.SimpleDeployedAppInfoBase.installApp(SimpleDeployedAppInfoBase.java:584)
        at com.ibm.ws.app.manager.module.internal.DeployedAppInfoBase.deployApp(DeployedAppInfoBase.java:625)
        at com.ibm.ws.app.manager.ear.internal.EARApplicationHandlerImpl.install(EARApplicationHandlerImpl.java:79)
        at com.ibm.ws.app.manager.internal.statemachine.StartAction.execute(StartAction.java:199)
        at com.ibm.ws.app.manager.internal.statemachine.ApplicationStateMachineImpl.enterState(ApplicationStateMachineImpl.java:1369)
        at com.ibm.ws.app.manager.internal.statemachine.ApplicationStateMachineImpl.run(ApplicationStateMachineImpl.java:912)
        at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:298)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:857)

Diagnostic files here captured by option: -Xdump:system+java+snap:events=throw+systhrow,filter=java/lang/NoClassDefFoundError,msg_filter=com.ibm.ws.classloading.internal.providers.Providers,request=exclusive+prepwalk+serial+preempt

Copy link

Issue Number: 20444
Status: Open
Recommended Components: comp:vm, comp:jclextensions, comp:gc
Recommended Assignees: jasonfengj9, keithc-ca, hangshao0

@hangshao0 hangshao0 changed the title NCDFE in our Liberty startup NCDFE in Liberty startup Oct 30, 2024
@hangshao0
Copy link
Contributor Author

From the tracepoint:

11:56:46.063008409  0x0000000000d23600 j9vm.315            Entry      >0Xd23600 loader 0x1ea71d24b40 class java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock arbitratedLoadClass enter className
11:56:46.063030006  0x0000000000d23600 j9vm.119            Event       send loadClass(java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock), stringObject: 0xff802740 loader: 0x816c7ec8
11:56:46.063063703 *0x0000000000c64900 j9vm.315            Entry      >0Xc64900 loader 0x1ea71d24b40 class com/ibm/ws/classloading/internal/providers/Providers arbitratedLoadClass enter className
11:56:46.063073692 *0x0000000000d23600 j9vm.120            Event       sent loadClass(java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock) --> got 0x80054208
11:56:46.063075972 *0x0000000000c64900 j9vm.119            Event       send loadClass(com/ibm/ws/classloading/internal/providers/Providers), stringObject: 0xff81aa58 loader: 0x816c7ec8
11:56:46.063091050  0x0000000000c64900 j9vm.120            Event       sent loadClass(com/ibm/ws/classloading/internal/providers/Providers) --> got 0x80054208
11:56:46.063093870 *0x0000000000d23600 j9vm.319            Exit       <0Xd23600 loader 0x1ea71d24b40 class java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock arbitratedLoadClass exit foundClass 0x1aa200
11:56:46.063122777 *0x0000000000c64900 j9vm.319            Exit       <0Xc64900 loader 0x1ea71d24b40 class com/ibm/ws/classloading/internal/providers/Providers arbitratedLoadClass exit foundClass 0x0                     <----- foundClass is 0x0 which causes the NoClassDefFoundError
11:56:46.063142825  0x0000000000c64900 j9vm.294            Entry      >setCurrentException index=21 constructorIndex=0 detailMessage=0xff81ab78

There are 2 threads loading classes:
Thread 0xd23600 is loading ReentrantReadWriteLock$WriteLock
Thread 0xc64900 is loading Providers

However, both threads got 0x80054208, which is ReentrantReadWriteLock$WriteLock. This is correct for thread 0xd23600, but incorrect for Thread 0xc64900.

11:56:46.063073692 *0x0000000000d23600 j9vm.120            Event       sent loadClass(java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock) --> got 0x80054208
11:56:46.063091050  0x0000000000c64900 j9vm.120            Event       sent loadClass(com/ibm/ws/classloading/internal/providers/Providers) --> got 0x80054208

The tracepoint is:

Trc_VM_internalFindClass_sentLoadClass(vmThread, classNameLength, className, sendLoadClassResult);

and sendLoadClassResult is from:

sendLoadClassResult = (j9object_t) vmThread->returnValue;

Could you look at why thread 0xc64900 get the same returnValue as thread 0xd23600 ? @thallium

@tjwatson
Copy link
Contributor

We have also observed this strange NoSuchMethodError where it seems to say it is looking for the getPrivateLibraries method on ReentrantReadWriteLock$WriteLock when that method is on the Providers class. That sounds like it could be another flavor of this issue where the thread got the wrong class?

java.lang.NoSuchMethodError: java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock.getPrivateLibraries(Lcom/ibm/wsspi/classloading/ClassLoaderConfiguration;)Ljava/util/List; (loaded from jrt:/java.base by <Bootstrap Loader>) called from class com.ibm.ws.classloading.internal.AppClassLoader (loaded from file:/C:/Users/jazz_build/Build/jazz-build-engines/wasrtc/EBCPROD/build/dev/image/output/wlp/lib/com.ibm.ws.classloading_1.1.95.jar by org.eclipse.osgi.internal.loader.EquinoxClassLoader@b19957cc[com.ibm.ws.classloading:1.1.95.202410071314(id=48)]).
    at com.ibm.ws.classloading.internal.AppClassLoader.<init>(AppClassLoader.java:172)
    at com.ibm.ws.classloading.internal.ClassLoaderFactory.createClassLoader(ClassLoaderFactory.java:138)
    at com.ibm.ws.classloading.internal.ClassLoaderFactory.create(ClassLoaderFactory.java:116)
    at com.ibm.ws.classloading.internal.ClassLoadingServiceImpl.createAppClassLoader(ClassLoadingServiceImpl.java:349)
    at com.ibm.ws.classloading.internal.ClassLoadingServiceImpl.createTopLevelClassLoader(ClassLoadingServiceImpl.java:321)
    at com.ibm.ws.classloading.internal.ClassLoadingServiceImpl.createTopLevelClassLoader(ClassLoadingServiceImpl.java:103)
    at com.ibm.ws.classloading.ClassLoaderConfigHelper.createTopLevelClassLoader(ClassLoaderConfigHelper.java:314)
    at com.ibm.ws.app.manager.module.internal.DeployedAppInfoBase.createTopLevelClassLoader(DeployedAppInfoBase.java:635)
    at com.ibm.ws.app.manager.war.internal.WARDeployedAppInfo.createModuleClassLoader(WARDeployedAppInfo.java:105)
    at com.ibm.ws.app.manager.module.internal.ExtendedModuleInfoImpl.getClassLoader(ExtendedModuleInfoImpl.java:124)
    at com.ibm.ws.container.service.annocache.internal.ModuleAnnotationsImpl.<init>(ModuleAnnotationsImpl.java:198)
    at com.ibm.ws.container.service.annocache.internal.ModuleAnnotationsImpl.<init>(ModuleAnnotationsImpl.java:161)
    at com.ibm.ws.container.service.annocache.internal.WebAnnotationsImpl.<init>(WebAnnotationsImpl.java:91)
    at com.ibm.ws.container.service.annocache.internal.WebAnnotationsAdapterImpl.adapt(WebAnnotationsAdapterImpl.java:63)
    at com.ibm.ws.container.service.annocache.internal.WebAnnotationsAdapterImpl.adapt(WebAnnotationsAdapterImpl.java:35)
    at com.ibm.ws.adaptable.module.internal.AdapterFactoryServiceImpl.adapt(AdapterFactoryServiceImpl.java:202)
    at com.ibm.ws.adaptable.module.internal.AdaptableContainerImpl.adapt(AdaptableContainerImpl.java:176)
    at com.ibm.ws.container.service.annocache.AnnotationsBetaHelper.getWebAnnotations(AnnotationsBetaHelper.java:270)
    at com.ibm.ws.webcontainer.osgi.container.config.WebAppConfigurator.setWebAnnotations(WebAppConfigurator.java:88)
    at com.ibm.ws.webcontainer.osgi.container.config.WebAppConfigurator.configure(WebAppConfigurator.java:424)
    at com.ibm.ws.webcontainer.osgi.container.config.WebAppConfigurationAdapter.adapt(WebAppConfigurationAdapter.java:121)
    at com.ibm.ws.webcontainer.osgi.container.config.WebAppConfigurationAdapter.adapt(WebAppConfigurationAdapter.java:45)
    at com.ibm.ws.adaptable.module.internal.AdapterFactoryServiceImpl.adapt(AdapterFactoryServiceImpl.java:202)
    at com.ibm.ws.adaptable.module.internal.AdaptableContainerImpl.adapt(AdaptableContainerImpl.java:176)
    at com.ibm.ws.webcontainer.osgi.WebContainer.createModuleMetaData(WebContainer.java:872)
    at com.ibm.ws.app.manager.module.internal.ModuleHandlerBase.createModuleMetaData(ModuleHandlerBase.java:65)
    at com.ibm.ws.app.manager.module.internal.SimpleDeployedAppInfoBase$ModuleContainerInfoBase.createModuleMetaData(SimpleDeployedAppInfoBase.java:222)
    at com.ibm.ws.app.manager.module.internal.SimpleDeployedAppInfoBase.preDeployApp(SimpleDeployedAppInfoBase.java:608)
    at com.ibm.ws.app.manager.module.internal.SimpleDeployedAppInfoBase.installApp(SimpleDeployedAppInfoBase.java:584)
    at com.ibm.ws.app.manager.module.internal.DeployedAppInfoBase.deployApp(DeployedAppInfoBase.java:625)
    at com.ibm.ws.app.manager.war.internal.WARApplicationHandlerImpl.install(WARApplicationHandlerImpl.java:67)
    at com.ibm.ws.app.manager.internal.statemachine.StartAction.execute(StartAction.java:199)
    at com.ibm.ws.app.manager.internal.statemachine.ApplicationStateMachineImpl.enterState(ApplicationStateMachineImpl.java:1369)
    at com.ibm.ws.app.manager.internal.statemachine.ApplicationStateMachineImpl.run(ApplicationStateMachineImpl.java:912)
    at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:298)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:857)

@hangshao0
Copy link
Contributor Author

@TobiAjila @gacholio Does this ring any bell to you ?

2 threads loading 2 different classes simultaneously. Thread1 loaded the correct class, but thread2 got the class loaded by thread1.

@tjwatson
Copy link
Contributor

For Liberty we have three levels of class loaders

  1. JVM loaders (boot, app class path etc)
  2. OSGi (Equinox) Bundle class loaders
  3. Liberty application class loaders

All three levels are storing classes into the Open J9 shared class cache. So far I only recall seeing these strange errors for the classes loaded by the Equinox bundle class loaders. This makes me suspect there is something specific to how Equinox uses the SharedClassURLHelper that is causing something to go wrong. There we store the classes with this call:

https://github.com/eclipse-equinox/equinox/blob/9ee49aa9caa8670b7d6af6fad7d65f7ea387c5fb/bundles/org.eclipse.osgi/container/src/org/eclipse/osgi/internal/cds/CDSHookImpls.java#L145

The call to com.ibm.oti.shared.SharedClassURLHelper.storeSharedClass(String, URL, Class) is done outside of any locks that would be held by the class loader for the class name being stored. But there proper locks while defining the class such that two threads do not try to call ClassLoader.defineClass multiple times for the same class. And only the thread that "won" in defining the class should be the thread that attempts to store the class after it has been defined.

Similarly the method com.ibm.oti.shared.SharedClassURLHelper.findSharedClass(String, URL, String) is used to find a shared class with this call:

https://github.com/eclipse-equinox/equinox/blob/9ee49aa9caa8670b7d6af6fad7d65f7ea387c5fb/bundles/org.eclipse.osgi/container/src/org/eclipse/osgi/internal/cds/CDSBundleFile.java#L155

Again, this call has no locks held by the class loader while calling the SharedClassURLHelper.findSharedClass. Is it possible there is a threading issue here with either looking up or storing shared classes?

@hangshao0
Copy link
Contributor Author

re comment

There is only one copy of com/ibm/ws/classloading/internal/providers/Providers in the shared cache. It was added a previous JVM, not by the current JVM, which means this JVM did not call storeSharedClass() on this class. I have dumped the com/ibm/ws/classloading/internal/providers/Providers romClass and compared it against its java code, it looks correct to me.

Also there is only one copy of java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock. It is loaded by bootstrap class loader and it looks correct to me.

I believe this JVM called findSharedClass() on com/ibm/ws/classloading/internal/providers/Providers, however it is a hashtable look up where the class name is the key. I don't see how findSharedClass(String className1) could return a class whose name is not className1.

@hangshao0
Copy link
Contributor Author

I believe this JVM called findSharedClass() on com/ibm/ws/classloading/internal/providers/Providers

Actually findSharedClass() is not called on com/ibm/ws/classloading/internal/providers/Providers by this JVM. There should be JCL: SharedClassURLHelperImpl findSharedClassImpl entry and exit trace points in #20444 (comment) if it is called (like below), but I don't see them for com/ibm/ws/classloading/internal/providers/Providers. So neither storeSharedClass() nor findSharedClass() is involved here.


11:56:46.062322947  0x0000000000c64900 j9vm.119            Event       send loadClass(com/ibm/ws/classloading/internal/ContainerClassLoader$SmartClassPathImpl$1), stringObject: 0xff818ec8 loader: 0x816c7ec8
11:56:46.062344575  0x0000000000c64900 j9jcl.108           Entry      >JCL: SharedClassURLHelperImpl findSharedClassImpl: Entering for helperID 49
11:56:46.062358974  0x0000000000c64900 j9jcl.539           Entry      >JCL: com.ibm.oti.shared getCachedURL: Entering for helperID=49
11:56:46.062368133  0x0000000000c64900 j9jcl.154           Exit       <JCL: com.ibm.oti.shared getCachedURL: Exiting with found cached URL
11:56:46.062392360  0x0000000000c64900 j9shr.1293          Event       RMI locateROMClass: Locate succeeded. Returning 0x1ea6ff2ca04. FoundAtIndex=0. StaleCPEI=0x0. Locate request for ROMClass com/ibm/ws/classloading/internal/ContainerClassLoader$SmartClassPathImpl$1 from helper ID 49 with cpeIndex -1.
11:56:46.062402089  0x0000000000c64900 j9shr.1297          Event       CM findROMClass: class com/ibm/ws/classloading/internal/ContainerClassLoader$SmartClassPathImpl$1 found at address 0x1ea6d70b0f8 with classpath index 0. The classpath id is 49.
11:56:46.062418908  0x0000000000c64900 j9jcl.110           Exit       <JCL: SharedClassURLHelperImpl findSharedClassImpl: Exiting with TRUE
11:56:46.062545145  0x0000000000c64900 j9vm.1              Entry      >Create RAM class from ROM class 0x1ea6d70b0f8 in class loader 0x1ea71d24b40
11:56:46.062553204  0x0000000000c64900 j9vm.80             Event       ROM class 0x1ea6d70b0f8 is named com/ibm/ws/classloading/internal/ContainerClassLoader$SmartClassPathImpl$1
11:56:46.062564743  0x0000000000c64900 j9vm.88             Event       Loaded implemented interface java/lang/Runnable (RAM class=0x65100)
11:56:46.062659994  0x0000000000c64900 j9vm.2              Exit       <Created RAM class 0xd11100 from ROM class 0x1ea6d70b0f8
11:56:46.062692960  0x0000000000c64900 j9vm.120            Event       sent loadClass(com/ibm/ws/classloading/internal/ContainerClassLoader$SmartClassPathImpl$1) --> got 0x8149e558

The normal sequence should be the class loader calls findSharedClass() to see if the class is in shared cache, if not, then load the class from disk. However, in this case the wrong class is returned even before findSharedClass() is called. I don't see the JVM creating the wrong class indicated by trace j9vm.1, j9vm2 for com/ibm/ws/classloading/internal/providers/Providers either. I guess the wrong class could be from the cached data structure mapping between the class name and the actual class/class bytes in the EquinoxClassLoader.

@tjwatson
Copy link
Contributor

tjwatson commented Oct 31, 2024

I guess the wrong class could be from the cached data structure mapping between the class name and the actual class/class bytes in the EquinoxClassLoader.

Because Equinox class loaders are a graph that may contain cycles it does not hold locks for an initial call to java.lang.ClassLoader.findLoadedClass(String). Here is the sequence the Equinox loader uses:

  1. Without holding any lock check ClassLoader.findLoadedClass(String) if that returns non-null then the class is returned; otherwise continue.
  2. Without holding any lock check SharedClassURLHelper.findSharedClass if that returns non-null then continue to step 3; otherwise the class has not been stored yet, so we do normal byte code search.
  3. Obtain a class-loader/class-name pair for the class about to be defined with the shared class byte[].
  4. Call ClassLoader.findLoadedClass(String) again to make sure another thread didn't define the class; if that returns null then call ClassLoader.defineClass with the shared class byte[].
  5. release the class-loader/class-name pair for the class that was defined.
  6. Return the shared Class object

Could there be some inconsistency with allowing another thread to get a non-null result from the call to ClassLoader.findLoadedClass(String) in step 1 while we are trying to define the shared class in step 4? For performance reasons, I would like to avoid having to get the class-loader/class-name pair lock to call that first findLoadedClass.

@hangshao0
Copy link
Contributor Author

I guess the wrong class could be from the cached data structure mapping between the class name and the actual class/class bytes in the EquinoxClassLoader.

I realized the wrong class returned is java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock, which is a bootstrap class. It is not a class that has just been loaded by thread 0xd23600 . It is a class that has already been loaded by bootstrap class loader before. It is very likely both threads get the class from bootstraploader.findLoadedClass(String classname) through delegation all the way upto the bootstrap class loader. It is just thread 0xd23600 is actually looking for java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock and but thread 0xc64900 is looking for a different class. The class name might becomes wrong somewhere in between.

Could there be some inconsistency with allowing another thread to get a non-null result from the call to ClassLoader.findLoadedClass(String) in step 1 while we are trying to define the shared class in step 4? For performance reasons, I would like to avoid having to get the class-loader/class-name pair lock to call that first findLoadedClass.

It is possible that a thread in step 1 can get a not null class while another thread is defining the class in step 4 (the thread should already be near the end of step 4 after the class is already added to the hashtable keyed on the class name, the class name in step 1 and 4 should be the same).

@tjwatson
Copy link
Contributor

tjwatson commented Nov 1, 2024

The class name might becomes wrong somewhere in between.

I'm out of ideas on the Equinox side how that could happen. I assume that must be going wrong down in the JVM?

@hangshao0
Copy link
Contributor Author

hangshao0 commented Nov 1, 2024

I'm out of ideas on the Equinox side how that could happen. I assume that must be going wrong down in the JVM?

This is a really strange issue. By looking at the code, I don't see how the JVM could be wrong about the class name either. Not sure if things go wrong inside JVM or before calling into JVM. But we can add assertions in the JVM code that the class being searched for and the class returned has the same name.

@hangshao0
Copy link
Contributor Author

@thallium is looking at adding the assertions.

@thallium
Copy link
Contributor

thallium commented Nov 4, 2024

@tjwatson Hi, just want to confirm that you're looking for a build of JDK17 on windows?

@hangshao0
Copy link
Contributor Author

hangshao0 commented Nov 4, 2024

@tjwatson Hi, just want to confirm that you're looking for a build of JDK17 on windows?

Thomas, @thallium is looking at providing a debugging build for you if you want to try.

@tjwatson
Copy link
Contributor

tjwatson commented Nov 4, 2024

@tjwatson Hi, just want to confirm that you're looking for a build of JDK17 on windows?

Thomas, @thallium is looking at providing a debugging build for you if you want to try.

We have a very large set of configurations we run our tests on ranging from Java 8 to Java 21. If we got a test build we would have to figure out a way to update some set of them to run with the debugging build.

I also have a separate build setup just for InstantOn where I have tested out development builds of OpenJ9, but that setup only runs instantOn tests and I am unsure we have seen this type of failure using Semeru InstantOn with any regularity.

Regardless, if you have a developer build you want us to try let me know and I can see if there is something we can do.

@thallium
Copy link
Contributor

thallium commented Nov 6, 2024

@tjwatson Hi, I've built JDK 17 and JDK 21 on both Linux and Windows:

@tjwatson
Copy link
Contributor

tjwatson commented Nov 7, 2024

@tjwatson Hi, I've built JDK 17 and JDK 21 on both Linux and Windows:

What do we expect to see if the issue happens when running these builds?

@thallium
Copy link
Contributor

thallium commented Nov 7, 2024

What do we expect to see if the issue happens when running these builds?

It should trigger an assertion and I also added a trace point containing relevant information. You can find my changes here: #20501

@hangshao0
Copy link
Contributor Author

hangshao0 commented Nov 7, 2024

It should trigger an assertion and I also added a trace point containing relevant information. You can find my changes here: #20501

@thallium It will be more helpful if you can add the message about assertion trace id and line number etc which would be printed out in the console so that Thomas can easily identify.

@thallium
Copy link
Contributor

thallium commented Nov 7, 2024

The assertion is just Assert_VM_true(0); so what will be printed to the console will be something like ** ASSERTION FAILED ** at .../openj9/runtime/vm/classsupport.c:<line number>: (0), where line number could be 396 or 414.

The trace point added is:

TraceException=Trc_VM_Class_Found_Has_Diff_Name Overhead=1 Level=1 Template="Expected class name: %.*s, found class name: %.*s, fromClassTable=%d"

@tjwatson
Copy link
Contributor

tjwatson commented Nov 7, 2024

Will any additional -Xdump options be necessary or helpful along with this build? Or are there other options I need to set to make the trace enabled?

@hangshao0
Copy link
Contributor Author

Will any additional -Xdump options be necessary or helpful along with this build? Or are there other options I need to set to make the trace enabled?

The new trace are enabled by default. The trace assertion failure will automatically trigger diagnostic files by default. No more option needed.

@tjwatson
Copy link
Contributor

We have observed this strange verify error that I think can be explained by the wrong class being returned:

com.ibm.ws.ssl.internal.SSLComponent java.lang.VerifyError: JVMVRFY021 thrown object not throwable; class=com/ibm/ws/ssl/internal/SSLComponent, method=<init>()V, pc=1
Exception Details:
  Location:
    com/ibm/ws/ssl/internal/SSLComponent.<init>()V [@1](https://wasrtc.hursley.ibm.com:9443/jazz/users/1): JBinvokespecial
  Reason:
    Type 'uninitializedThis' (current frame, stack[0]) is not assignable to 'com/ibm/ws/ssl/internal/GenericSSLConfigService'
  Current Frame:
    bci: [@1](https://wasrtc.hursley.ibm.com:9443/jazz/users/1)
    flags: { }
    locals: { 'com/ibm/ws/ssl/internal/SSLComponent' }
    stack: { 'uninitializedThis' }
    at java.base/java.lang.J9VMInternals.prepareClassImpl(Native Method)
    at java.base/java.lang.J9VMInternals.prepare(J9VMInternals.java:377)
    at java.base/java.lang.Class.getConstructors(Class.java:742)
    at org.apache.felix.scr.impl.inject.internal.ComponentConstructorImpl.<init>(ComponentConstructorImpl.java:98)
    at org.apache.felix.scr.impl.inject.internal.ComponentMethodsImpl.initComponentMethods(ComponentMethodsImpl.java:110)
    at org.apache.felix.scr.impl.manager.AbstractComponentManager.initDependencyManagers(AbstractComponentManager.java:1029)
    at org.apache.felix.scr.impl.manager.AbstractComponentManager.collectDependencies(AbstractComponentManager.java:1057)
    at org.apache.felix.scr.impl.manager.SingleComponentManager.getServiceInternal(SingleComponentManager.java:955)
    at org.apache.felix.scr.impl.manager.AbstractComponentManager.activateInternal(AbstractComponentManager.java:776)
    at org.apache.felix.scr.impl.manager.AbstractComponentManager.enableInternal(AbstractComponentManager.java:674)
    at org.apache.felix.scr.impl.manager.AbstractComponentManager.enable(AbstractComponentManager.java:437)
    at org.apache.felix.scr.impl.manager.ConfigurableComponentHolder.enableComponents(ConfigurableComponentHolder.java:671)
    at org.apache.felix.scr.impl.BundleComponentActivator.initialEnable(BundleComponentActivator.java:310)
    at org.apache.felix.scr.impl.Activator.loadComponents(Activator.java:613)
    at org.apache.felix.scr.impl.Activator.access$200(Activator.java:75)
    at org.apache.felix.scr.impl.Activator$ScrExtension.start(Activator.java:480)
    at org.apache.felix.scr.impl.AbstractExtender.createExtension(AbstractExtender.java:196)
    at org.apache.felix.scr.impl.AbstractExtender.modifiedBundle(AbstractExtender.java:169)
    at org.apache.felix.scr.impl.AbstractExtender.modifiedBundle(AbstractExtender.java:49)
    at org.osgi.util.tracker.BundleTracker$Tracked.customizerModified(BundleTracker.java:488)
    at org.osgi.util.tracker.BundleTracker$Tracked.customizerModified(BundleTracker.java:1)
    at org.osgi.util.tracker.AbstractTracked.track(AbstractTracked.java:232)
    at org.osgi.util.tracker.BundleTracker$Tracked.bundleChanged(BundleTracker.java:450)
    at org.eclipse.osgi.internal.framework.BundleContextImpl.dispatchEvent(BundleContextImpl.java:949)
    at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:234)
    at org.eclipse.osgi.framework.eventmgr.ListenerQueue.dispatchEventSynchronous(ListenerQueue.java:151)
    at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEventPrivileged(EquinoxEventPublisher.java:229)
    at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:138)
    at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:130)
    at org.eclipse.osgi.internal.framework.EquinoxContainerAdaptor.publishModuleEvent(EquinoxContainerAdaptor.java:217)
    at org.eclipse.osgi.container.Module.publishEvent(Module.java:499)
    at org.eclipse.osgi.container.Module.start(Module.java:486)
    at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel$2.run(ModuleContainer.java:1852)
    at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel$1.execute(ModuleContainer.java:1840)
    at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.incStartLevel(ModuleContainer.java:1845)
    at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.incStartLevel(ModuleContainer.java:1789)
    at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.doContainerStartLevel(ModuleContainer.java:1750)
    at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.dispatchEvent(ModuleContainer.java:1672)
    at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.dispatchEvent(ModuleContainer.java:1)
    at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:234)
    at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run(EventManager.java:345)

Note that SSLComponent extends GenericSSLConfigService from the same package, loaded by the same class loader, so I don't see how the above error message can be true.

The stack frame org.apache.felix.scr.impl.inject.internal.ComponentConstructorImpl.<init>(ComponentConstructorImpl.java:98) is calling getConstructors on the SSLComponent class.

@hangshao0
Copy link
Contributor Author

The error in comment is more related to verifier that does not allow uninitialized this to be assigned to the super class.

tjwatson added a commit to tjwatson/equinox that referenced this issue Nov 15, 2024
This is related to Open J9 issue
eclipse-openj9/openj9#20444

There is a concern that the initial call to findLoadedClass
by Equinox is not protected by the same loader-classname lock
as the later calls to findLoadedClass/defineClass
calls.

There are concerns that the unprotected findLoadedClass call
could catch the JVM in an invalid state if there is another
thread in the middle of defining the class.

This changes replaces the clunky way of doing loader-classname
locks with a more efficient strategy. This way we can use
this lock around the initial call to findLoadedClass
without too much concern over the performance impact.

This change also removes some of the old code that still
supproted Java 6.  It is always assumed we now can register
the class loaders as parallel capable.
@tjwatson
Copy link
Contributor

The error in comment is more related to verifier that does not allow uninitialized this to be assigned to the super class.

Similar to the exception, I cannot make any sense of that. Nothing in the code is trying to assign unninialized this to the super class type. It seems more likely we got the wrong class back and trying to assign to the wrong class fails.

tjwatson added a commit to tjwatson/equinox that referenced this issue Nov 15, 2024
This is related to Open J9 issue
eclipse-openj9/openj9#20444

There is a concern that the initial call to findLoadedClass
by Equinox is not protected by the same loader-classname lock
as the later calls to findLoadedClass/defineClass
are.

There are concerns that the unprotected findLoadedClass call
could catch the JVM in an invalid state if there is another
thread in the middle of defining the class.

This change replaces the clunky way of doing loader-classname
locks with a more efficient strategy. This way we can use
this lock around the initial call to findLoadedClass
without too much concern over the performance impact.

This change also removes some of the old code that still
supported Java 6.  It is always assumed we now can register
the class loaders as parallel capable.
tjwatson added a commit to tjwatson/equinox that referenced this issue Nov 20, 2024
This is related to Open J9 issue
eclipse-openj9/openj9#20444

There is a concern that the initial call to findLoadedClass
by Equinox is not protected by the same loader-classname lock
as the later calls to findLoadedClass/defineClass
are.

There are concerns that the unprotected findLoadedClass call
could catch the JVM in an invalid state if there is another
thread in the middle of defining the class.

This change replaces the clunky way of doing loader-classname
locks with a more efficient strategy. This way we can use
this lock around the initial call to findLoadedClass
without too much concern over the performance impact.

This change also removes some of the old code that still
supported Java 6.  It is always assumed we now can register
the class loaders as parallel capable.
tjwatson added a commit to tjwatson/equinox that referenced this issue Dec 2, 2024
This is related to Open J9 issue
eclipse-openj9/openj9#20444

There is a concern that the initial call to findLoadedClass
by Equinox is not protected by the same loader-classname lock
as the later calls to findLoadedClass/defineClass
are.

There are concerns that the unprotected findLoadedClass call
could catch the JVM in an invalid state if there is another
thread in the middle of defining the class.

This change replaces the clunky way of doing loader-classname
locks with a more efficient strategy. This way we can use
this lock around the initial call to findLoadedClass
without too much concern over the performance impact.

This change also removes some of the old code that still
supported Java 6.  It is always assumed we now can register
the class loaders as parallel capable.
tjwatson added a commit to eclipse-equinox/equinox that referenced this issue Dec 2, 2024
This is related to Open J9 issue
eclipse-openj9/openj9#20444

There is a concern that the initial call to findLoadedClass
by Equinox is not protected by the same loader-classname lock
as the later calls to findLoadedClass/defineClass
are.

There are concerns that the unprotected findLoadedClass call
could catch the JVM in an invalid state if there is another
thread in the middle of defining the class.

This change replaces the clunky way of doing loader-classname
locks with a more efficient strategy. This way we can use
this lock around the initial call to findLoadedClass
without too much concern over the performance impact.

This change also removes some of the old code that still
supported Java 6.  It is always assumed we now can register
the class loaders as parallel capable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants