-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GR-61151] Trim internal JFR stacktraces to match OpenJDK #10385
base: master
Are you sure you want to change the base?
Conversation
substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/SubstrateOptions.java
Outdated
Show resolved
Hide resolved
...evm/src/com.oracle.svm.core/src/com/oracle/svm/core/doc-files/JfrTrimInternalStackTraces.txt
Outdated
Show resolved
Hide resolved
@@ -71,6 +71,7 @@ public class JfrStackTraceRepository implements JfrRepository { | |||
private final JfrStackTraceEpochData epochData1; | |||
|
|||
private int stackTraceDepth; | |||
private boolean trimInternalStackTraces; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is a constant at runtime. Too bad that we can't declare it as final here. Not sure how important that is in practice though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's a bit unfortunate. It gets set during startup once the runtime options are parsed. We also need to change this during the JFR tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to turn the option from a RuntimeOptionKey
into a HostedOptionKey
and make sure trimming is installed at build time? Is it really necessary for the option to be mutable at run time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it could make sense to set this option at build time with a HostedOptionKey
. The 2 downsides would be:
- We'd need to remove the test that checks trimming stacktraces, since we won't be able to toggle the option dynamically at runtime. Is there a way to provide
HostedOptionKey
to themx native-unittest
build? - Users will need to rebuild their executables if they discover that they need full stacktraces. This is probably not much of a problem since I think most users won't need that.
Do you think it's worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE 1: I guess you could unconditionally disable trimming in native_unittests_task
in mx_substratevm.py
. That should do the try on the gate. When you want to run mx native-unittest
on the cmd line, you need to pass it as an additional build flag.
RE 2: Agreed, normal users are probably not too interested in this. Disabling trimming only really seems useful to measure JFR overhead, something only we really care about, right?
Do you think it's worth it?
Leaving this up to you. :)
@@ -167,8 +169,12 @@ private static int visitFrame(JfrNativeEventWriterData data, CodeInfo codeInfo, | |||
int numStackTraceElements = 0; | |||
FRAME_INFO_CURSOR.initialize(codeInfo, ip, false); | |||
while (FRAME_INFO_CURSOR.advance()) { | |||
FrameInfoQueryResult frame = FRAME_INFO_CURSOR.get(); | |||
if (SubstrateJVM.getStackTraceRepo().getTrimInternalStackTraces() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's constant, I prefer moving the condition before the loop, even if the compiler can do that for us ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm not sure we can move the conditional out of the loop since we must recheck it with each new advancement of FRAME_INFO_CURSOR
. SubstrateJVM.getStackTraceRepo().getTrimInternalStackTraces()
is constant but we can't even move that part outside since we still need to execute the loop to potentially serialize frames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was only talking about moving SubstrateJVM.getStackTraceRepo().getTrimInternalStackTraces()
outside the loop.
if (data.isNonNull()) { | ||
FrameInfoQueryResult frame = FRAME_INFO_CURSOR.get(); | ||
serializeStackTraceElement(data, frame); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this additional check be within the if (data.isNonNull()) {
branch or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe moving it inside if (data.isNonNull())
would be a small optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't think it should be moved inside the if (data.isNonNull())
check. We must increment numStackTraceElements
regardless of the if (data.isNonNull())
check, but must not increment if we are trimming the current stacktrace.
…bstrateOptions.java Co-authored-by: Fabio Niephaus <[email protected]>
@@ -167,8 +170,12 @@ private static int visitFrame(JfrNativeEventWriterData data, CodeInfo codeInfo, | |||
int numStackTraceElements = 0; | |||
FRAME_INFO_CURSOR.initialize(codeInfo, ip, false); | |||
while (FRAME_INFO_CURSOR.advance()) { | |||
FrameInfoQueryResult frame = FRAME_INFO_CURSOR.get(); | |||
if (VMInspectionOptions.JfrTrimInternalStackTraces.getValue() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filtering everything related to com.oracle.svm.*
seems too generic and may omit relevant stack frames (e.g., cases where Native Image actually implements some runtime functionality in Java). I think that this logic needs to be more selective.
I also think that you should only filter frames at the very top of the stack trace. Your current approach may omit frames in the middle of the stack, which will result in broken/impossible stack traces.
Doing the filtering during the serialization seems problematic as well. Users that configure a small stack depth (e.g., -XX:FlightRecorderOptions=stackdepth=5
), may end up with a lot of empty stack traces.
Besides that, I think that filtering is only reasonably easy for JFR events but difficult for JFR sampling:
- JFR events: here, we can probably just skip all frames until we encounter some well-known top-level frame of our JFR infrastructure (or some well-known VM-internal class).
- sampling: this is a lot more complex, especially with the async sampler, as we may see completely arbitrary stack frames. How would we decide if we are in VM internal code without walking a lot (maybe even all?) of the stack? E.g., we might be in VM-internal code but there can be some JDK code at the top of the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback. How do you feel about using skipCount
to trim only internal JFR frames on SubstrateVM level events? This would be just the top ~3 frames and not affect CPU profiler samples or JDK events. This would avoid filtering out any middle frames, and also avoid unnecessarily shortening stacktraces
due to filtering at serialization time. Limiting the trimming to only JFR frames while keeping other internal frames would also make it easier to maintain. JDK events use a hardcoded skipcount too.
This wouldn't exactly match what users see in OpenJDK, but would at least get rid of the top few redundant internal JFR frames.
Alternatively, like you mentioned, we could filter from the top of the stack down until we hit a known method. And only do this for VM-level events while walking the stack. This involves more steps since it must use FrameInfoCursor
while stack walking, but it's nice that it avoids needing to manually set a skipcount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to give the hardcoded skipcount a try as it is the simplest approach. If that turns out to be a maintenance burden, we can always implement a more complex approach later on.
We will need fairly good tests for the stack traces of the different JFR events though. Otherwise, it is probably very easy to introduce a regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented the skipcount approach and modified the tests for all the events affected to ensure we don't end up skipping too many frames. I left the JfrTrimInternalStackTraces
hosted option set to true by default.
Note: |
|
||
/** | ||
* This file contains the VM-level events that Native Image supports on all JDK versions. The event | ||
* IDs depend on the JDK version (see metadata.xml file) and are computed at image build time. | ||
*/ | ||
public final class JfrEvent { | ||
public static final JfrEvent ThreadStart = create("jdk.ThreadStart"); | ||
private static final int defaultInternalSkipCount = VMInspectionOptions.JfrTrimInternalStackTraces.getValue() ? 3 : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we apply the skip count to all JFR events or are there any events that need to be excluded explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be fine to apply the skipcount to all events defined in SubstrateVM. The periodic ones and others that don't contain stacktraces would just ignore the skipcount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then let's use the skipCount for all events. Please remove the static field defaultInternalSkipCount
and move its logic directly into getSkipCount()
.
@@ -79,6 +80,11 @@ private static void validateEvents(List<RecordedEvent> events) { | |||
boolean parkUnparkFound = false; | |||
|
|||
for (RecordedEvent event : events) { | |||
List<RecordedFrame> frames = event.getStackTrace().getFrames(); | |||
assertTrue(frames.size() > 0); | |||
System.out.println(frames.getFirst().getMethod().getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this on purpose or is it a leftover from debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should have been replaced with checkStackTraceTrimming
. Thanks for spotting that.
@@ -54,6 +54,9 @@ public void test() throws Throwable { | |||
|
|||
private static void validateEvents(List<RecordedEvent> events) { | |||
assertTrue(events.size() > 0); | |||
for (RecordedEvent event : events) { | |||
checkStackTraceTrimming(event, "emit"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we trim the "emit" methods as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to trim the "emit" methods as well, but when I increment the skipcount, it seems to trim 2 frames. Maybe this is due to inlining? I'm not entirely sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I decided it was better to trim slightly less and keep more information in the stacktrace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to trim 2 frames
Interesting - it seems that the skipCount
logic is broken at the moment as it is applied on the level of physical stack frames (instead of virtual frames). We need to fix that, otherwise, skipCount
will be pretty much unpredictable.
We would probably need to do something like the following:
- remove the
skipCount
fromJfrStackWalker
(at that point in time, we only know about the physical stack frames) - modify
JfrStackWalker
so that it stores the requested skip count at the very beginning of the sampler data (i.e., before any recorded IPs) - when serializing the raw IPs into JFR stack traces, we need to apply the skip count on the level of Java frames
Please let me know if you can look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can look into this. I'll add that change to this pull request (unless you think it's cleaner separately).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also probably need to delay when we do truncation to the time of serialization. Otherwise we may truncate unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I think that 2 different kinds of truncation are needed for the bottom-most frames:
JfrStackWalker
: truncate so that we have at mostskipCount + getMaxDepth()
physical frames (each physical frame corresponds to at least 1 virtual frame)- during serialization: apply skipCount to remove the top frames and truncate the bottom-most frames so that we have at most
getMaxDepth()
virtual frames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I see. I'll work on adding this. Thanks!
25f2411
to
23de2fb
Compare
Hi @christianhaeubl, I've implemented the fix for the skipcount bug we talked about. Please have another look when you have time. Thank you! |
ec138f2
to
1496d78
Compare
1496d78
to
5e4048d
Compare
Summary:
OpenJDK JFR does not record VM level frames. Previously, Native Image JFR reported internal SubstrateVM frames in stacktraces. This is bad because it makes it confusing/difficult for users to see where events are coming from since the ends of every trace look the same. For example, this adds noise to flamegraphs. It can also mislead to users to believing that SubstrateVM internals are the cause of event emissions if they are accustomed to the way stacktraces are reported in OpenJDK.
This PR adds trimming of internal JFR stacktraces by default. I've also added a runtime option to disable this (and revert to the old behavior) to aid in Native Image development/debugging. Now, any SubstrateVM frames in samples taken by the CPU profiler or regular events are filtered out.
Use
-XX:-JfrTrimInternalStackTraces
to disable the filtering at runtime.Related Issue: #7095
Note:
In this PR, the filtering is done at the time of stacktrace serialization. This happens at a chunk rotation. The problem with this approach is that we have already computed the stacktraces up to the maximum depth specified by the user (default is 64). So, it the point of sampling, we may need to truncate at the max depth, then later we may further reduce the stacktrace size by filtering during serialization. This means we would have truncated some frames unnecessarily. In reality, this is probably not much of an issue since there's only 3-5 internal frames being filtered out for most events. I've considered the below alternatives:
FrameInfoCursor
instance. This isn't possible because we need to do the stack-walking inside the JFR sampler signal handler. It's also not possible to allocate a newFrameInfoCursor
whenever we need to use it. One solution is to preallocate an instance ofFrameInfoCursor
for each thread and useJfrExecutionSampler.preventSamplingInCurrentThread()
to prevent races if preempted for signal handling. But this seems overly complex and unnecessary compared to the current solution.