Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage of AsyncGetCallTrace leads to livelock in 'getSendSlotsFromSignature' #20577

Open
jbachorik opened this issue Nov 12, 2024 · 14 comments
Open

Comments

@jbachorik
Copy link

jbachorik commented Nov 12, 2024

Problem description

When using profiler that calls AsyncGetCallTrace in a signal handler, the profiled application locks up sooner or later.

Inspection of the locked up application in gdb reveals the following stacktrace and an attempt to resume the execution is unsuccessful as the thread is blocked in getSendSlotsFromSignature:

(gdb) thread 101
[Switching to thread 101 (Thread 0x7f4612627700 (LWP 4187263))]
#0  0x00007f468f249a9c in getSendSlotsFromSignature () from /home/bits/.sdkman/candidates/java/11.0.24-sem/lib/default/libj9jit29.so
(gdb) bt
#0  0x00007f468f249a9c in getSendSlotsFromSignature () from /home/bits/.sdkman/candidates/java/11.0.24-sem/lib/default/libj9jit29.so
#1  0x00007f468f22e6b2 in jitWalkStackFrames () from /home/bits/.sdkman/candidates/java/11.0.24-sem/lib/default/libj9jit29.so
#2  0x00007f468faded66 in walkStackFrames () from /home/bits/.sdkman/candidates/java/11.0.24-sem/lib/default/libj9vm29.so
#3  0x00007f469405ef36 in protectedASGCT () from /home/bits/.sdkman/candidates/java/11.0.24-sem/lib/default/libjvm.so
#4  0x00007f468f9f53c9 in omrsig_protect () from /home/bits/.sdkman/candidates/java/11.0.24-sem/lib/default/libj9prt29.so
#5  0x00007f469405f244 in AsyncGetCallTrace () from /home/bits/.sdkman/candidates/java/11.0.24-sem/lib/default/libjvm.so
#6  0x00007f4665b49d05 in Profiler::getJavaTraceAsync (this=0x7f46908c5b30, ucontext=0x1154240, frames=0x7f4691594eb0, max_depth=512, java_ctx=0x1154160, truncated=0x1154123) at /home/bits/go/src/github.com/DataDog/java-profiler/ddprof-lib/src/main/cpp/profiler.cpp:424
#7  0x00007f4665b4affe in Profiler::recordSample (this=0x7f46908c5b30, ucontext=0x1154240, counter=10000000, tid=4187263, event_type=0, call_trace_id=0, event=0x11541f0) at /home/bits/go/src/github.com/DataDog/java-profiler/ddprof-lib/src/main/cpp/profiler.cpp:722
#8  0x00007f4665b4f12f in CTimer::signalHandler (signo=27, siginfo=0x1154370, ucontext=0x1154240) at /home/bits/go/src/github.com/DataDog/java-profiler/ddprof-lib/src/main/cpp/ctimer_linux.cpp:223
#9  <signal handler called>
#10 0x00007f468f221834 in fast_jitMethodMonitorEntry () from /home/bits/.sdkman/candidates/java/11.0.24-sem/lib/default/libj9jit29.so
#11 0x00007f468f231eab in jitMethodMonitorEntry () from /home/bits/.sdkman/candidates/java/11.0.24-sem/lib/default/libj9jit29.so
#12 0x00000007345e5c80 in ?? ()
#13 0x00000007346d54a8 in ?? ()
#14 0x0000000670004d98 in ?? ()
#15 0x00000006700045a8 in ?? ()
#16 0x000000067086c488 in ?? ()
#17 0x00007f4613ace6b5 in ?? ()
#18 0x00000001c0c16d38 in ?? ()
#19 0x0000000734639060 in ?? ()
#20 0x0000000670004d98 in ?? ()
#21 0x00000007346d54a8 in ?? ()
#22 0x0000000000000000 in ?? ()
(gdb) next
Single stepping until exit from function getSendSlotsFromSignature,
which has no line number information.
^C
Thread 1 "java" received signal SIGINT, Interrupt.
[Switching to Thread 0x7f46940e1300 (LWP 4100003)]
__pthread_clockjoin_ex (threadid=139941108295424, thread_return=0x7ffd38af7c98, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
145     pthread_join_common.c: No such file or directory.

This particular stacktrace is from Java 11.0.24 but the same behaviour is observed on 8.0.422, 17.0.12 and 21.0.4

Steps to reproduce

  1. Download Datadog tracer/profiler agent (https://github.com/DataDog/dd-trace-java/releases/tag/v1.42.1)
  2. Download Renaissance benchmark (https://github.com/renaissance-benchmarks/renaissance/releases/download/v0.15.0/renaissance-mit-0.15.0.jar)
  3. Run profiling session like java -javaagent:<pathto>/dd-java-agent-1.42.1.jar -Ddd.profiling.enabled=true -Ddd.profiling.ddprof.enabled=true -Ddd.profiling.upload.period=10 -Ddd.profiling.start-force-first=true -jar <path-to>/renaissance-mit-0.15.0.jar akka-uct -r 50000 - mind you, the lock up is usually not happening immediately, but within 15-20 minutes it is guaranteed, according to my experiments.

Note: The current version of dd-trace-java agent will not start on JDK 21.0.4 - it is a known issue and I tested 21.0.4 with a patched version of the agent.

Copy link

Issue Number: 20577
Status: Open
Recommended Components: comp:vm, comp:test, comp:gc
Recommended Assignees: gacholio, jasonfengj9, babsingh

@pshipton
Copy link
Member

@hzongaro pls take a look
@tajila @gacholio

@gacholio
Copy link
Contributor

UDATA
getSendSlotsFromSignature(const U_8* signature)
{
UDATA sendArgs = 0;
UDATA i = 1; /* 1 to skip the opening '(' */
for (; ; i++) {
switch (signature[i]) {
case ')':
return sendArgs;
case '[':
/* skip all '['s */
for (i++; signature[i] == '['; i++);
if (signature[i] == 'L') {
/* FALL THRU */
} else {
sendArgs++;
break;
}
case 'L':
for (i++; signature[i] != ';'; i++);
sendArgs++;
break;
case 'D':
case 'J':
sendArgs += 2;
break;
default:
/* any other primitive type */
sendArgs++;
break;
}
}
}

The code assumes a valid signature as input. It's not clear to me how this could loop forever - a crash I could understand.

@jbachorik
Copy link
Author

Could this be caused by passing garbage to signature ?
Although, I would expect it running out of valid address space pretty soon.

Unfortunately, this is happening consistently - this function is the last on stack and, as I mentioned, when I attach via gdb and try to 'step-out' the action will block indefinitely and I have to interrupt it.

@hzongaro
Copy link
Member

@a7ehuo, may I ask you to take a look at at this problem?

@a7ehuo
Copy link
Contributor

a7ehuo commented Nov 15, 2024

I can reproduce the hang problem with JDK11. signature is empty.

  55   Thread 0x7f0f4da77700 (LWP 448993) "UCT-akka.actor." getSendSlotsFromSignature (signature=0x20e000b "")
    at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/util/sendslot.c:33

(gdb) fr 0
#0  getSendSlotsFromSignature (signature=0x20e000b "")
    at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/util/sendslot.c:33
33		for (; ; i++) {
(gdb) p *signature
$23 = 0 '\000'
(gdb) p i
$24 = 6982497
(gdb) p sendArgs
$25 = 6982497

(gdb) up
#1  0x00007f0fcb795cba in jitWalkResolveMethodFrame (walkState=0x7f0f4da73d60)
    at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/codert_vm/jswalk.c:1173
1173			pendingSendSlots = getSendSlotsFromSignature(J9UTF8_DATA(signature)) + (walkStackedReceiver ? 1 : 0);
(gdb) p signature
$32 = (J9UTF8 *) 0x20e0009

> !J9UTF8 0x20e0009
J9UTF8 at 0x20e0009 {
  Fields for J9UTF8:
	0x0: U16 length = 0x0000 (0)
	0x2: U8[] data = !j9x 0x00000000020E000B
}
> !j9x 0x00000000020E000B
0x020E000B :  0000000000000000 0000000000000000 [ ................ ] 
0x020E001B :  0000000000000000 0000000000000000 [ ................ ] 
(gdb) bt
#0  getSendSlotsFromSignature (signature=0x20e000b "")
    at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/util/sendslot.c:33
#1  0x00007f0fcb795cba in jitWalkResolveMethodFrame (walkState=0x7f0f4da73d60)
    at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/codert_vm/jswalk.c:1173
#2  walkTransitionFrame (walkState=0x7f0f4da73d60)
    at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/codert_vm/jswalk.c:383
#3  jitWalkStackFrames (walkState=0x7f0f4da73d60) at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/codert_vm/jswalk.c:162
#4  0x00007f0fd0600fba in walkStackFrames (currentThread=<optimized out>, walkState=0x7f0f4da73d60)
    at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/vm/swalk.c:388
#5  0x00007f0fd07cb101 in protectedASGCT (portLib=portLib@entry=0x7f0fd07f2400 <j9portLibrary>, arg=arg@entry=0x7f0f4da74150)
    at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/j9vm/asgct.cpp:207
#6  0x00007f0fd03b62f1 in omrsig_protect (portLibrary=0x7f0fd07f2400 <j9portLibrary>, 
    fn=0x7f0fd07caf20 <protectedASGCT(J9PortLibrary*, void*)>, fn_arg=0x7f0f4da74150, 
    handler=0x7f0fd07caf10 <emptySignalHandler(J9PortLibrary*, U_32, void*, void*)>, handler_arg=0x0, flags=505, 
    result=0x7f0f4da74148) at /root/home/ahuo/src/openj9-openjdk-jdk11/omr/port/unix/omrsignal.c:425
#7  0x00007f0fd07cb408 in AsyncGetCallTrace (trace=0x7f0f4da74250, depth=512, ucontext=0x7f0f4da74400)
    at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/j9vm/asgct.cpp:231
#8  0x00007f0fbc7a07a6 in Profiler::getJavaTraceAsync(void*, ASGCT_CallFrame*, int, StackContext*, bool*) ()
   from /tmp/libjavaProfiler17103205139225625269.so
#9  0x00007f0fbc7a1727 in Profiler::recordSample(void*, unsigned long long, int, int, unsigned int, Event*) ()
   from /tmp/libjavaProfiler17103205139225625269.so
#10 0x00007f0fbc7758eb in CTimer::signalHandler(int, siginfo_t*, void*) () from /tmp/libjavaProfiler17103205139225625269.so
#11 <signal handler called>
#12 jitWriteBarrierStoreGenerationalAndConcurrentMark ()
    at /root/home/ahuo/src/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/vm/runtime/codert_vm/xnathelp.s:9526
#13 0x000000000101f400 in ?? ()
#14 0x00007f0f4da74b60 in ?? ()
#15 0x00000000ab038338 in ?? ()
#16 0x00000000a58c14f8 in ?? ()
#17 0x00000000a35f0000 in ?? ()
#18 0x0000000007a48338 in ?? ()
#19 0x00000000f4949500 in ?? ()
#20 0x00000000f49494f0 in ?? ()
#21 0x00007f0fd075f72a in j9_void_void ()
   from /root/home/ahuo/src/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9vm29.so
#22 0x00007f0f00000000 in ?? ()
#23 0x00000000a4263b60 in ?? ()
#24 0x0000000000000000 in ?? ()
(gdb) 

@gacholio
Copy link
Contributor

The caller at line 1173 is the normal case, not OSR.

@gacholio
Copy link
Contributor

Can you get a java stack dump at the point of failure?

@a7ehuo
Copy link
Contributor

a7ehuo commented Nov 15, 2024

Can you get a java stack dump at the point of failure?

Do you mean walkState->currentThread or walkState->walkThread? They are the same here 0x101f400. jdmpview can't print the stack

(gdb) p *walkState
$36 = {previous = 0x0, walkThread = 0x101f400, javaVM = 0x7f0fcc03fc30, flags = 3932193, bp = 0x107fd00, unwindSP = 0x30, 
  pc = 0x0, nextPC = 0x0, sp = 0x107fce0, arg0EA = 0x107fb38, literals = 0x0, walkSP = 0x107fce0, argCount = 0, 
  constantPool = 0x15fe63ff0, method = 0xa42b93d0, jitInfo = 0x0, frameFlags = 4067081520, resolveFrameFlags = 4067081520, 
  skipCount = 0, maxFrames = 512, userData1 = 0x7f0fcd442860, userData2 = 0x0, userData3 = 0x0, userData4 = 0x0, framesWalked = 0, 
  frameWalkFunction = 0x7f0fd07cb1d0 <asyncFrameIterator(J9VMThread*, J9StackWalkState*)>, objectSlotWalkFunction = 0x0, 
  returnAddressWalkFunction = 0x0, cache = 0x0, restartPoint = 0x0, restartException = 0x0, inlinerMap = 0x0, inlineDepth = 0, 
  cacheCursor = 0x0, decompilationRecord = 0x0, registerEAs = {jit_rax = 0x0, jit_rbx = 0x0, jit_rcx = 0x0, jit_rdx = 0x0, 
    jit_rdi = 0x0, jit_rsi = 0x0, jit_rbp = 0x0, jit_rsp = 0x0, jit_r8 = 0x0, jit_r9 = 0x0, jit_r10 = 0x0, jit_r11 = 0x0, 
    jit_r12 = 0x0, jit_r13 = 0x0, jit_r14 = 0x0, jit_r15 = 0x0}, walkedEntryLocalStorage = 0x7f0f4da74b80, 
  i2jState = 0x7f0f4da74b90, decompilationStack = 0x0, pcAddress = 0x107fcf8, outgoingArgCount = 0, objectSlotBitVector = 0x0, 
  elsBitVector = 0, savedObjectSlotWalkFunction = 0x0, bytecodePCOffset = -1, 
  dropToCurrentFrame = 0x7f0fd0600440 <dropToCurrentFrame>, j2iFrame = 0x0, previousFrameFlags = 4067081520, slotIndex = -1, 
  slotType = 4, currentThread = 0x101f400, linearSlotWalker = 0x0, inlinedCallSite = 0x0, stackMap = 0x0, inlineMap = 0x0, 
  loopBreaker = 11984}
(gdb) p walkState->currentThread
$37 = (struct J9VMThread *) 0x101f400
(gdb) p walkState->walkThread
$38 = (struct J9VMThread *) 0x101f400
> !stack 0x101f400
Nov 14, 2024 5:23:12 PM com.ibm.j9ddr.vm29.events.DefaultEventListener corruptData
WARNING: CorruptDataException thrown walking stack. walkThread = 0x000000000101F400
com.ibm.j9ddr.corereaders.memory.MemoryFault: Memory Fault reading 0x15FE63FF0 
	at openj9.dtfj/com.ibm.j9ddr.corereaders.memory.AbstractMemory.getBytesAt(AbstractMemory.java:184)
	at openj9.dtfj/com.ibm.j9ddr.corereaders.memory.AbstractMemory.getBytesAt(AbstractMemory.java:176)
	at openj9.dtfj/com.ibm.j9ddr.corereaders.memory.AbstractMemory.getLongAt(AbstractMemory.java:241)
	at openj9.dtfj/com.ibm.j9ddr.corereaders.memory.ProcessAddressSpace.getPointerAt(ProcessAddressSpace.java:75)
	at com.ibm.j9ddr.vm29.pointer.AbstractPointer.getPointerAtOffset(AbstractPointer.java:360)
	at com.ibm.j9ddr.vm29.pointer.generated.J9ConstantPoolPointer.ramClass(Unknown Source)
	at com.ibm.j9ddr.vm29.j9.ConstantPoolHelpers.J9_CLASS_FROM_CP(ConstantPoolHelpers.java:76)
	at com.ibm.j9ddr.vm29.j9.ConstantPoolHelpers.J9_CLASS_FROM_METHOD(ConstantPoolHelpers.java:81)
	at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalkerUtils.swPrintMethod(StackWalkerUtils.java:160)
	at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalkerUtils.swPrintMethod(StackWalkerUtils.java:153)
	at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalker$StackWalker_29_V0.printFrameType(StackWalker.java:1249)
	at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalker$StackWalker_29_V0.walkJITJNICalloutFrame(StackWalker.java:1265)
	at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalker$StackWalker_29_V0.commonWalker(StackWalker.java:455)
	at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalker$StackWalker_29_V0.walkStackFrames(StackWalker.java:196)
	at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalker.walkStackFrames(StackWalker.java:99)
	at com.ibm.j9ddr.vm29.tools.ddrinteractive.commands.StackWalkCommand.run(StackWalkCommand.java:144)
	at openj9.dtfj/com.ibm.j9ddr.tools.ddrinteractive.Context.tryCommand(Context.java:229)
	at openj9.dtfj/com.ibm.j9ddr.tools.ddrinteractive.Context.execute(Context.java:202)
	at openj9.dtfj/com.ibm.j9ddr.tools.ddrinteractive.DDRInteractive.execute(DDRInteractive.java:356)
	at openj9.dtfj/com.ibm.j9ddr.command.CommandReader.processLine(CommandReader.java:78)
	at openj9.dtfj/com.ibm.j9ddr.tools.ddrinteractive.DDRInteractive.processLine(DDRInteractive.java:331)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:572)
	at openj9.dtfjview/com.ibm.jvm.dtfjview.CombinedContext.executeDDRInteractiveCommand(CombinedContext.java:258)
	at openj9.dtfjview/com.ibm.jvm.dtfjview.CombinedContext.execute(CombinedContext.java:169)
	at openj9.dtfjview/com.ibm.jvm.dtfjview.Session.execute(Session.java:813)
	at openj9.dtfjview/com.ibm.jvm.dtfjview.Session.execute(Session.java:767)
	at openj9.dtfjview/com.ibm.jvm.dtfjview.tools.ToolsRegistry.executeJdmpviewCommand(ToolsRegistry.java:183)
	at openj9.dtfjview/com.ibm.jvm.dtfjview.tools.ToolsRegistry.execute(ToolsRegistry.java:107)
	at openj9.dtfjview/com.ibm.jvm.dtfjview.tools.ToolsRegistry.execute(ToolsRegistry.java:88)
	at openj9.dtfjview/com.ibm.jvm.dtfjview.tools.ToolsRegistry.recordAndExecute(ToolsRegistry.java:72)
	at openj9.dtfjview/com.ibm.jvm.dtfjview.Session.runInteractive(Session.java:741)
	at openj9.dtfjview/com.ibm.jvm.dtfjview.Session.run(Session.java:665)
	at openj9.dtfjview/com.ibm.jvm.dtfjview.DTFJView.launch(DTFJView.java:52)
	at openj9.dtfjview/com.ibm.jvm.dtfjview.DTFJView.main(DTFJView.java:46)

Stack walk result: STACK_CORRUPT
> !j9vmthread 0x0101f400
J9VMThread at 0x101f400 {
  Fields for J9VMThread:
	0x0: struct JNINativeInterface_* functions = !jninativeinterface_ 0x00007F0FD07A7BA0
	0x8: struct J9JavaVM* javaVM = !j9javavm 0x00007F0FCC03FC30
	0x10: UDATA* arg0EA = !j9x 0x000000000107FB38
	0x18: UDATA* bytecodes = !j9x 0x0000000000000000
	0x20: UDATA* sp = !j9x 0x000000000107FCE0
	0x28: U8* pc = !j9x 0x0000000000000006
	0x30: struct J9Method* literals = !j9method 0x0000000000000000
	0x38: UDATA jitStackFrameFlags = 0x0000000000000000 (0)
	0x40: struct J9Object* jitException = !j9object 0x0000000000000000
	0x48: struct J9Object* currentException = !j9object 0x0000000000000000
	0x50: UDATA* stackOverflowMark = !j9x 0xFFFFFFFFFFFFFFFF
	0x58: UDATA* stackOverflowMark2 = !j9x 0x000000000107BB48
	0x60: U8* heapAlloc = !j9x 0x00000000F4949510 // ""
	0x68: U8* heapTop = !j9x 0x00000000F4952C20 // ""
	0x70: IDATA tlhPrefetchFTA = 0x0000000000000000 (0)
	0x78: U8* nonZeroHeapAlloc = !j9x 0x0000000000000000
	0x80: U8* nonZeroHeapTop = !j9x 0x0000000000000000
	0x88: IDATA nonZeroTlhPrefetchFTA = 0x0000000000000000 (0)
	0x90: struct J9ThreadMonitor* publicFlagsMutex = !j9threadmonitor 0x00007F0EE4006F98
	0x98: UDATA publicFlags = 0x0000000000000021 (33)
	0xa0: UDATA compressObjectReferences = 0x0000000000000001 (1)
	0xa8: struct J9Object* threadObject = !j9object 0x00000000A5320918 // akka/dispatch/MonitorableThreadFactory$AkkaForkJoinWorkerThread
	0xb0: void* lowTenureAddress = !j9x 0x00000000A35F0000
	0xb8: void* highTenureAddress = !j9x 0x00000000ADB50000
	0xc0: void* heapBaseForActiveCardTable = !j9x 0x0000000000000000
	0xc8: void* activeCardTableBase = !j9x 0x00007F0FC8C43000
	0xd0: UDATA heapSizeForActiveCardTable = 0x0000000000000000 (0)
	0xd8: void* heapBaseForBarrierRange0 = !j9x 0x00000000A35F0000
	0xe0: UDATA heapSizeForBarrierRange0 = 0x000000000A560000 (173408256)
	0xe8: UDATA* jniLocalReferences = !j9x 0x0000000000000000
	0xf0: UDATA tempSlot = 0x00007F0F5026C9D4 (139703745956308)
	0xf8: void* jitReturnAddress = !j9x 0x00007F0F4FC302DA
	0x100: void* floatTemp1 = !j9x 0x00000000AB038338
	0x108: void* floatTemp2 = !j9x 0x00000000F4949500
	0x110: void* floatTemp3 = !j9x 0x0000000000000000
	0x118: void* floatTemp4 = !j9x 0x0000000000000000
	0x120: UDATA returnValue = 0x00000000F490FC20 (4103142432)
	0x128: UDATA returnValue2 = 0x0000000000000001 (1)
	0x130: UDATA* objectFlagSpinLockAddress = !j9x 0x0000000000000000
	0x138: struct J9JavaStack* stackObject = !j9javastack 0x0000000001079340
	0x140: struct J9Thread* osThread = !j9thread 0x00007F0EFC005BE8
	0x148: UDATA inspectionSuspendCount = 0x0000000000000000 (0)
	0x150: UDATA inspectorCount = 0x0000000000000000 (0)
	0x158: U32 eventFlags = 0x00000010 (16)
	0x15c: U32 osrFrameIndex = 0x00000088 (136)
	0x160: void* codertTOC = !j9x 0x0000000000000000
	0x168: U8* cardTableVirtualStart = !j9x 0x00007F0FC8728080 // ","
	0x170: struct J9Object* stopThrowable = !j9object 0x0000000000000000
	0x178: struct J9Object* outOfMemoryError = !j9object 0x00000000A58711E0 // java/lang/OutOfMemoryError
	0x180: UDATA* jniCurrentReference = !j9x 0x0000000000000000
	0x188: UDATA* jniLimitReference = !j9x 0x0000000000000000
	0x190: struct J9VMThread* linkNext = !j9vmthread 0x0000000000E64800
	0x198: struct J9VMThread* linkPrevious = !j9vmthread 0x0000000000ED2C00
	0x1a0: UDATA privateFlags = 0x0000000000100000 (1048576)
	0x1a8: UDATA jitTOC = 0x0000000000000000 (0)
	0x1b0: UDATA ferReturnType = 0x0000000000000000 (0)
	0x1b8: U64 ferReturnValue = 0x0000000000000000 (0)
	0x1c0: U64 mgmtBlockedTimeTotal = 0x0000000000000000 (0)
	0x1c8: U64 mgmtBlockedTimeStart = 0x0000000000000000 (0)
	0x1d0: U64 mgmtWaitedTimeTotal = 0x0000000000000000 (0)
	0x1d8: U64 mgmtWaitedTimeStart = 0x0000000000000000 (0)
	0x1e0: UDATA jniVMAccessCount = 0x0000000000000000 (0)
	0x1e8: UDATA debugEventData1 = 0x0000000000000000 (0)
	0x1f0: UDATA debugEventData2 = 0x0000000000000000 (0)
	0x1f8: UDATA debugEventData3 = 0x0000000000000000 (0)
	0x200: UDATA debugEventData4 = 0x0000000000000000 (0)
	0x208: UDATA debugEventData5 = 0x0000000000000000 (0)
	0x210: UDATA debugEventData6 = 0x0000000000000000 (0)
	0x218: UDATA debugEventData7 = 0x0000000000000000 (0)
	0x220: UDATA debugEventData8 = 0x0000000000000000 (0)
	0x228: struct J9StackElement* classLoadingStack = !j9stackelement 0x0000000000000000
	0x230: UDATA jitTransitionJumpSlot = 0x0000000000000000 (0)
	0x238: struct J9ThreadMonitor* gcClassUnloadingMutex = !j9threadmonitor 0x0000000000000000
	0x240: struct J9VMThread* gcClassUnloadingThreadPrevious = !j9vmthread 0x0000000000000000
	0x248: struct J9VMThread* gcClassUnloadingThreadNext = !j9vmthread 0x0000000000000000
	0x250: struct J9StackWalkState* stackWalkState = !j9stackwalkstate 0x000000000101F700
	0x258: struct J9VMEntryLocalStorage* entryLocalStorage = !j9vmentrylocalstorage 0x00007F0F4DA74B80
	0x260: UDATA gpProtected = 0x0000000000000001 (1)
	0x268: struct J9VMGCSublistFragment gcRememberedSet = !j9vmgcsublistfragment 0x000000000101F668
	0x298: struct MM_GCRememberedSetFragment sATBBarrierRememberedSetFragment = !mm_gcrememberedsetfragment 0x000000000101F698
	0x2c8: void* gcTaskListPtr = !j9x 0x0000000000000000
	0x2d0: UDATA* dropBP = !j9x 0x0000000000000000
	0x2d8: UDATA dropFlags = 0x0000000000000000 (0)
	0x2e0: struct J9Pool* monitorEnterRecordPool = !j9pool 0x00007F0EFC007080
	0x2e8: struct J9MonitorEnterRecord* monitorEnterRecords = !j9monitorenterrecord 0x0000000000000000
	0x2f0: UDATA* jniArrayCache = !j9x 0x0000000000000000
	0x2f8: UDATA* jniArrayCache2 = !j9x 0x0000000000000000
	0x300: struct J9StackWalkState inlineStackWalkState = !j9stackwalkstate 0x000000000101F700
	0x538: struct J9JITDecompilationInfo* decompilationStack = !j9jitdecompilationinfo 0x0000000000000000
	0x540: struct J9ModronThreadLocalHeap allocateThreadLocalHeap = !j9modronthreadlocalheap 0x000000000101F940
	0x570: struct J9ModronThreadLocalHeap nonZeroAllocateThreadLocalHeap = !j9modronthreadlocalheap 0x000000000101F970
	0x5a0: void* sidecarEvent = !j9x 0x0000000000000000
	0x5a8: struct PortlibPTBuffers_struct* ptBuffers = !portlibptbuffers_ 0x0000000000000000
	0x5b0: struct J9Object* blockingEnterObject = !j9object 0x0000000000000000
	0x5b8: void* gcExtensions = !j9x 0x00007F0EFC01EC88
	0x5c0: UDATA contiguousIndexableHeaderSize = 0x0000000000000008 (8)
	0x5c8: UDATA discontiguousIndexableHeaderSize = 0x0000000000000010 (16)
	0x5d0: UDATA isIndexableDataAddrPresent = 0x0000000000000000 (0)
	0x5d8: U32 isVirtualLargeObjectHeapEnabled = 0x00000000 (0)
	0x5e0: void* gpInfo = !j9x 0x0000000000000000
	0x5e8: void* jitVMwithThreadInfo = !j9x 0x00007F0EEC01F360
	0x5f0: U8* profilingBufferEnd = !j9x 0x0000000000000000
	0x5f8: U8* profilingBufferCursor = !j9x 0x0000000000000000
	0x600: UDATA* j2iFrame = !j9x 0x0000000000000000
	0x608: UDATA currentOSStackFree = 0x000000000003CBE0 (248800)
	0x610: UDATA mgmtBlockedCount = 0x0000000000000016 (22)
	0x618: UDATA mgmtWaitedCount = 0x000000000000002A (42)
	0x620: UDATA mgmtBlockedStart = 0x0000000000000000 (0)
	0x628: UDATA mgmtWaitedStart = 0x0000000000000000 (0)
	0x630: UDATA cardTableShiftSize = 0x0000000000000009 (9)
	0x638: void* aotVMwithThreadInfo = !j9x 0x0000000000000000
	0x640: UDATA asyncEventFlags = 0x0000000000000004 (4)
	0x648: struct J9Object* forceEarlyReturnObjectSlot = !j9object 0x0000000000000000
	0x650: struct J9MonitorEnterRecord* jniMonitorEnterRecords = !j9monitorenterrecord 0x0000000000000000
	0x658: struct J9DLTInformationBlock dltBlock = !j9dltinformationblock 0x000000000101FA58
	0x818: struct J9VMGCSegregatedAllocationCacheEntry* segregatedAllocationCache = !j9vmgcsegregatedallocationcacheentry 0x0000000000000000
	0x820: struct J9StackWalkState* activeWalkState = !j9stackwalkstate 0x00007F0F4DA73D60
	0x828: void* jniCalloutArgs = !j9x 0x0000000000000000
	0x830: struct J9VMThread* exclusiveVMAccessQueueNext = !j9vmthread 0x0000000000000000
	0x838: struct J9VMThread* exclusiveVMAccessQueuePrevious = !j9vmthread 0x0000000000000000
	0x840: struct J9Object* javaLangThreadLocalCache = !j9object 0x0000000000000000
	0x848: UDATA jitCountDelta = 0x0000000000000002 (2)
	0x850: UDATA maxProfilingCount = 0x0000000000001771 (6001)
	0x858: j9objectmonitor_t[] objectMonitorLookupCache = !j9x 0x000000000101FC58
	0x958: UDATA jniCriticalCopyCount = 0x0000000000000000 (0)
	0x960: UDATA jniCriticalDirectCount = 0x0000000000000000 (0)
	0x968: struct J9Pool* jniReferenceFrames = !j9pool 0x0000000000000000
	0x970: U32 ludclInlineDepth = 0x00000000 (0)
	0x974: U32 ludclBPOffset = 0x00000000 (0)
	0x978: UDATA* osrJittedFrameCopy = !j9x 0x0000000000000000
	0x980: struct J9OSRBuffer* osrBuffer = !j9osrbuffer 0x0000000000000000
	0x988: void* osrReturnAddress = !j9x 0x00007F0FCB7A30E0
	0x990: void* osrScratchBuffer = !j9x 0x00007F0EEC003B30
	0x998: void* jitArtifactSearchCache = !j9x 0x00007F0EEC027901
	0x9a0: void* jitExceptionHandlerCache = !j9x 0x0000000000000000
	0x9a8: void* jitPrivateData = !j9x 0x00007F0F84616FE0
	0x9b0: struct J9Method* jitMethodToBeCompiled = !j9method 0x0000000000000000
	0x9b8: UDATA privateFlags2 = 0x0000000000000008 (8)
	0x9c0: struct OMR_VMThread* omrVMThread = !omr_vmthread 0x000000000101FE68
	0x9c8: void* gpuInfo = !j9x 0x0000000000000000
	0x9d0: void* startOfMemoryBlock = !j9x 0x000000000101F3D8
	0x9d8: UDATA inNative = 0x0000000000000000 (0)
	0x9e0: struct J9JITDecompilationInfo* lastDecompilation = !j9jitdecompilationinfo 0x0000000000000000
	0x9e8: struct J9GSParameters gsParameters = !j9gsparameters 0x000000000101FDE8
	0x9f8: UDATA readBarrierRangeCheckBase = 0xFFFFFFFFFFFFFFFF (-1)
	0xa00: UDATA readBarrierRangeCheckTop = 0x0000000000000000 (0)
	0xa08: U32 readBarrierRangeCheckBaseCompressed = 0xFFFFFFFF (4294967295)
	0xa0c: U32 readBarrierRangeCheckTopCompressed = 0x00000000 (0)
	0xa10: UDATA safePointCount = 0x0000000000000000 (0)
	0xa18: volatile struct J9HashTable* utfCache = !j9hashtable 0x0000000000000000
	0xa20: struct J9JFRBuffer jfrBuffer = !j9jfrbuffer 0x000000000101FE20
	0xa40: UDATA unsafeIndexableHeaderSize = 0x0000000000000008 (8)
	0xa48: struct J9ThreadJFRState threadJfrState = !j9threadjfrstate 0x000000000101FE48
}

@a7ehuo
Copy link
Contributor

a7ehuo commented Nov 21, 2024

I added Assert_Util_true(signature[0] == '('); in getSendSlotsFromSignature hoping to catch the invalid case right when it happens. But now I'm seeing an assert from performDecompile.

05:02:13.592 0x1aa00j9codertvm(j9ji.104    *   ** ASSERTION FAILED ** at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/codert_vm/decomp.cpp:836: ((vm->jitConfig->fsdEnabled))

The assert happens because both decompRecord->usesOSR and currentThread->javaVM->jitConfig->fsdEnabled are 0 [1]. Another weird thing in decompRecord is that decompRecord->osrBuffer->numberOfFrames is also 0 [2]. The decompRecord->method is java/lang/ClassValue.getFromHashMap(Ljava/lang/Class;)Ljava/lang/Object;. The data from currentThread->lastDecompilation looks valid which is also for method java/lang/ClassValue.getFromHashMap(Ljava/lang/Class;)Ljava/lang/Object; [3].

[1]

#13 0x00007f77502c1010 in javaTrace (env=<optimized out>, modInfo=<optimized out>, traceId=<optimized out>, spec=<optimized out>)
    at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/rastrace/trcengine.c:1388
#14 0x00007f774bd01e2b in performDecompile (currentThread=currentThread@entry=0x1aa00, 
    decompileState=decompileState@entry=0x7f7750e4e270, decompRecord=decompRecord@entry=0x7f774e291b80, 
    osrFrame=osrFrame@entry=0x7f774e291bc8, numberOfFrames=numberOfFrames@entry=0)
    at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/codert_vm/decomp.cpp:836
#15 0x00007f774bd01f68 in jitDecompileMethod (currentThread=currentThread@entry=0x1aa00, decompRecord=0x7f774e291b80)
    at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/codert_vm/decomp.cpp:549
#16 0x00007f774bd03025 in c_jitDecompileAtCurrentPC (currentThread=0x1aa00)
    at /root/home/ahuo/src/openj9-openjdk-jdk11/openj9/runtime/codert_vm/decomp.cpp:2535
#17 0x00007f774bd12fbc in jitDecompileAtCurrentPC ()
    at /root/home/ahuo/src/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/vm/runtime/codert_vm/xnathelp.s:11545
#18 0x000000000001aa00 in ?? ()
#19 0x00007f7750e4e780 in ?? ()
#20 0x00000000a3dcfed8 in ?? ()
#21 0x00000000eb4e3338 in ?? ()
#22 0x000000000018f100 in ?? ()
--Type <RET> for more, q to quit, c to continue without paging--
#23 0x0000000000cbd528 in ?? ()
#24 0x0000000000000001 in ?? ()
#25 0x00000000a3782f40 in ?? ()
#26 0x00007f7750e4e6b0 in ?? ()
#27 0x0000000000000000 in ?? ()

[2]

(gdb) p decompRecord
$20 = (J9JITDecompilationInfo *) 0x7f774e291b80

> !j9jitdecompilationinfo 0x7f774e291b80
J9JITDecompilationInfo at 0x7f774e291b80 {
  Fields for J9JITDecompilationInfo:
	0x0: struct J9JITDecompilationInfo* next = !j9jitdecompilationinfo 0x0000000000000000
	0x8: UDATA* bp = !j9x 0x0000000000108708
	0x10: U8* pc = !j9x 0x00007F76D055EA5D // "��t.�hH�}P���
                                                                "
	0x18: void** pcAddress = !j9x 0x0000000000108640
	0x20: struct J9Method* method = !j9method 0x000000000018E008 // java/lang/ClassValue.getFromHashMap(Ljava/lang/Class;)Ljava/lang/Object;
	0x28: UDATA reason = 0x0000000000000080 (128)
	0x30: UDATA usesOSR = 0x0000000000000000 (0) //<------- 
	0x38: struct J9OSRBuffer osrBuffer = !j9osrbuffer 0x00007F774E291BB8
}
> !j9osrbuffer 0x00007F774E291BB8
J9OSRBuffer at 0x7f774e291bb8 {
  Fields for J9OSRBuffer:
	0x0: UDATA numberOfFrames = 0x0000000000000000 (0) //<------- 
	0x8: void* jitPC = !j9x 0x00007F7646505845
}

[3]

	0x9e0: struct J9JITDecompilationInfo* lastDecompilation = !j9jitdecompilationinfo 0x00007F76C400B8C0

> !j9jitdecompilationinfo 0x00007F76C400B8C0
J9JITDecompilationInfo at 0x7f76c400b8c0 {
  Fields for J9JITDecompilationInfo:
	0x0: struct J9JITDecompilationInfo* next = !j9jitdecompilationinfo 0x0000000000000000
	0x8: UDATA* bp = !j9x 0x0000000000108788
	0x10: U8* pc = !j9x 0x00007F76D055EA5D // "��t.�hH�}P���
                                                                "
	0x18: void** pcAddress = !j9x 0x00000000001086C0
	0x20: struct J9Method* method = !j9method 0x000000000018E008 // java/lang/ClassValue.getFromHashMap(Ljava/lang/Class;)Ljava/lang/Object;
	0x28: UDATA reason = 0x0000000000000080 (128)
	0x30: UDATA usesOSR = 0x0000000000000001 (1)
	0x38: struct J9OSRBuffer osrBuffer = !j9osrbuffer 0x00007F76C400B8F8
}
> 

> !j9osrbuffer 0x00007F76C400B8F8
J9OSRBuffer at 0x7f76c400b8f8 {
  Fields for J9OSRBuffer:
	0x0: UDATA numberOfFrames = 0x0000000000000001 (1)
	0x8: void* jitPC = !j9x 0x00007F76D055EA5D
}

@gacholio
Copy link
Contributor

I doubt asserts work during ASGCT (trace in general is disabled), and it most definitely can not trigger decompilation.

@a7ehuo
Copy link
Contributor

a7ehuo commented Nov 21, 2024

doubt asserts work during ASGCT (trace in general is disabled) ...

That might explain why I didn't see the application stop to dump the core even when it detected the signature's format is incorrect in getSendSlotsFromSignature. I also added the following code right after Assert_Util_true(signature[0] == '('); hoping to get a crash right there.

   Assert_Util_true(signature[0] == '(');

   if (signature[0] != '(') {
      *(volatile int*)(0) = 1;
      raise(SIGABRT);
      raise(SIGTRAP);
   }

I also run the test with -Xdump:system+java+snap:events=user+gpf,request=exclusive+prepwalk+preempt. With the extra code and -Xdump option, I get the assert at performDecompile. I suspect the sequence of events might be that something is wrong first with the decompilation and then later the hang happens in AsyncGetCallTrace.

@gacholio
Copy link
Contributor

The entirety of the ASGCT call is signal protected (since we're so often attempting to walk in-flight stacks which results in a crash). Your best bet might be to print to the console and capture that output in the test run.

@a7ehuo
Copy link
Contributor

a7ehuo commented Nov 25, 2024

I added debug code in jitWalkResolveMethodFrame to print out some information as soon as it detects the first character of the signature is invalid. In one case where the invalid signature is detected [1], the resolveFrameType is 0x200000: J9_STACK_FLAGS_JIT_SPECIAL_METHOD_RESOLVE. The constantPool 0x7f7c42d6f5a0 is invalid [2]. That's why the retrieved romMethodRef and signature are all invalid. At the time, no method has been compiled [3]. I'm not sure how it could be related to JIT'd methods. @gacholio @TobiAjila Could the VM team take a further look? Thank you!

[1]

jitWalkResolveMethodFrame + 700 in section .text of /root/home/ahuo/src/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so
jitWalkStackFrames + 2104 in section .text of /root/home/ahuo/src/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so
walkStackFrames + 170 in section .text of /root/home/ahuo/src/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9vm29.so
protectedASGCT + 481 in section .text of /root/home/ahuo/src/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libjvm.so
omrsig_protect + 689 in section .text of /root/home/ahuo/src/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9prt29.so
AsyncGetCallTrace + 360 in section .text of /root/home/ahuo/src/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libjvm.so
Profiler::getJavaTraceAsync(void*, ASGCT_CallFrame*, int, StackContext*, bool*) + 374 in section .text of /tmp/libjavaProfiler8348843943949000496.so
Profiler::recordSample(void*, unsigned long long, int, int, unsigned int, Event*) + 487 in section .text of /tmp/libjavaProfiler8348843943949000496.so
CTimer::signalHandler(int, siginfo_t*, void*) + 315 in section .text of /tmp/libjavaProfiler8348843943949000496.so
__restore_rt in section .text of /lib/x86_64-linux-gnu/libc.so.6
VM_BytecodeInterpreterCompressed::run(J9VMThread*) + 12981 in section .text of /root/home/ahuo/src/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9vm29.so
bytecodeLoopCompressed + 205 in section .text of /root/home/ahuo/src/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9vm29.so
c_cInterpreter + 82 in section .text of /root/home/ahuo/src/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9vm29.so
====== akka-uct (concurrency) [default], iteration 0 started ======
GC before operation: completed in 141.216 ms, heap usage 17.822 MB -> 14.156 MB.
-----------------
jitWalkResolveMethodFrame: DEBUG walkState 0x7f7c42d6e3a0: walkThread 0x1aa00 currentThread 0x1aa00 method 0xa38c69d0 literals (nil) pc 0x1085f0 sp 0x108570 unwindSP 0x1c walkSP 0x108570
jitWalkResolveMethodFrame: DEBUG walkState 0x7f7c42d6e3a0: walkThread 0x1aa00 walkThread->pc 0x3 walkThread->sp 0x108570 walkThread->literals (nil)
jitWalkResolveMethodFrame: DEBUG trackStackTrace 0x7f7c42174e60 trackStackTraceIndex 1 resolveFrameType 0x200000 cpIndex 144 constantPool 0x7f7c42d6f5a0 romMethodRef 0x7f7c42ce4aa0 indexAndLiterals (nil)
jitWalkResolveMethodFrame: DEBUG signature 0x7f7c42dcca38 :   

[2]

> !j9constantpool 0x7f7c42d6f5a0
J9ConstantPool at 0x7f7c42d6f5a0 {
  Fields for J9ConstantPool:
	0x0: struct J9Class* ramClass = !j9class 0x00007F7C3EED8A98 // <FAULT>
	0x8: struct J9ROMConstantPoolItem* romConstantPool = !j9romconstantpoolitem 0x0000000000000000
}


> !j9rommethodref 0x7f7c42ce4aa0
J9ROMMethodRef at 0x7f7c42ce4aa0 {
  Fields for J9ROMMethodRef:
	0x0: U32 classRefCPIndex = 0x42CA3534 (1120548148)
	0x4: J9SRP(J9ROMNameAndSignature) nameAndSignature = !j9romnameandsignature 0x00007F7C42CECA20
}

> !j9romnameandsignature 0x00007F7C42CECA20
J9ROMNameAndSignature at 0x7f7c42ceca20 {
  Fields for J9ROMNameAndSignature:
	0x0: J9SRP(J9UTF8) name = !j9utf8 0x00007F7C42CECCC3
	0x4: J9SRP(J9UTF8) signature = !j9utf8 0x00007F7C42DCCA36

> !j9utf8 0x00007F7C42DCCA36
J9UTF8 at 0x7f7c42dcca36 {
  Fields for J9UTF8:
	0x0: U16 length = 0x0000 (0)
	0x2: U8[] data = !j9x 0x00007F7C42DCCA38
}

> !j9x 0x00007F7C42DCCA38
0x7F7C42DCCA38 :  0000000000000000 0000000000000000 [ ................ ] 
0x7F7C42DCCA48 :  0000000000000000 0000000000000000 [ ................ ] 
> 

[3]

> info jitm

> 
(kca)  method *  
    J9Class            J9Method               Start          Len {ClassPath/Name.MethodName}
--------------------------------------------------------------------------------------------
JIT'ed code memory:        ~0 bytes --
Number of JIT'ed methods:  ~0 methods --

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants