Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GR-61151] Trim internal JFR stacktraces to match OpenJDK #10385

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

roberttoyonaga
Copy link
Collaborator

Summary:

OpenJDK JFR does not record VM level frames. Previously, Native Image JFR reported internal SubstrateVM frames in stacktraces. This is bad because it makes it confusing/difficult for users to see where events are coming from since the ends of every trace look the same. For example, this adds noise to flamegraphs. It can also mislead to users to believing that SubstrateVM internals are the cause of event emissions if they are accustomed to the way stacktraces are reported in OpenJDK.

This PR adds trimming of internal JFR stacktraces by default. I've also added a runtime option to disable this (and revert to the old behavior) to aid in Native Image development/debugging. Now, any SubstrateVM frames in samples taken by the CPU profiler or regular events are filtered out.

Use -XX:-JfrTrimInternalStackTraces to disable the filtering at runtime.

Related Issue: #7095

Note:

In this PR, the filtering is done at the time of stacktrace serialization. This happens at a chunk rotation. The problem with this approach is that we have already computed the stacktraces up to the maximum depth specified by the user (default is 64). So, it the point of sampling, we may need to truncate at the max depth, then later we may further reduce the stacktrace size by filtering during serialization. This means we would have truncated some frames unnecessarily. In reality, this is probably not much of an issue since there's only 3-5 internal frames being filtered out for most events. I've considered the below alternatives:

  1. Use "skipCount" instead of filtering. This would require maintaining a hardcoded skipcount for each event type. For some events (such ObjectAllocationSample) multiple paths could be taken leading up to event emission so a single event may require different hardcoded skipcounts depending on the code path taken. I don't like this approach because it becomes harder to maintain as more events are added and the internal code changes.
  2. Filter earlier at the time of stack-walking (long before stacktrace serialization). This would allow us to avoid truncating traces unless truly necessary. The problem with this is that we'd need to synchronize access to a FrameInfoCursor instance. This isn't possible because we need to do the stack-walking inside the JFR sampler signal handler. It's also not possible to allocate a new FrameInfoCursor whenever we need to use it. One solution is to preallocate an instance of FrameInfoCursor for each thread and use JfrExecutionSampler.preventSamplingInCurrentThread() to prevent races if preempted for signal handling. But this seems overly complex and unnecessary compared to the current solution.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Dec 27, 2024
@@ -71,6 +71,7 @@ public class JfrStackTraceRepository implements JfrRepository {
private final JfrStackTraceEpochData epochData1;

private int stackTraceDepth;
private boolean trimInternalStackTraces;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a constant at runtime. Too bad that we can't declare it as final here. Not sure how important that is in practice though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's a bit unfortunate. It gets set during startup once the runtime options are parsed. We also need to change this during the JFR tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to turn the option from a RuntimeOptionKey into a HostedOptionKey and make sure trimming is installed at build time? Is it really necessary for the option to be mutable at run time?

Copy link
Collaborator Author

@roberttoyonaga roberttoyonaga Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it could make sense to set this option at build time with a HostedOptionKey. The 2 downsides would be:

  1. We'd need to remove the test that checks trimming stacktraces, since we won't be able to toggle the option dynamically at runtime. Is there a way to provide HostedOptionKey to the mx native-unittest build?
  2. Users will need to rebuild their executables if they discover that they need full stacktraces. This is probably not much of a problem since I think most users won't need that.

Do you think it's worth it?

Copy link
Member

@fniephaus fniephaus Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE 1: I guess you could unconditionally disable trimming in native_unittests_task in mx_substratevm.py. That should do the try on the gate. When you want to run mx native-unittest on the cmd line, you need to pass it as an additional build flag.
RE 2: Agreed, normal users are probably not too interested in this. Disabling trimming only really seems useful to measure JFR overhead, something only we really care about, right?

Do you think it's worth it?

Leaving this up to you. :)

@@ -167,8 +169,12 @@ private static int visitFrame(JfrNativeEventWriterData data, CodeInfo codeInfo,
int numStackTraceElements = 0;
FRAME_INFO_CURSOR.initialize(codeInfo, ip, false);
while (FRAME_INFO_CURSOR.advance()) {
FrameInfoQueryResult frame = FRAME_INFO_CURSOR.get();
if (SubstrateJVM.getStackTraceRepo().getTrimInternalStackTraces() &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's constant, I prefer moving the condition before the loop, even if the compiler can do that for us ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not sure we can move the conditional out of the loop since we must recheck it with each new advancement of FRAME_INFO_CURSOR. SubstrateJVM.getStackTraceRepo().getTrimInternalStackTraces() is constant but we can't even move that part outside since we still need to execute the loop to potentially serialize frames.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was only talking about moving SubstrateJVM.getStackTraceRepo().getTrimInternalStackTraces() outside the loop.

if (data.isNonNull()) {
FrameInfoQueryResult frame = FRAME_INFO_CURSOR.get();
serializeStackTraceElement(data, frame);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this additional check be within the if (data.isNonNull()) { branch or not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe moving it inside if (data.isNonNull()) would be a small optimization

Copy link
Collaborator Author

@roberttoyonaga roberttoyonaga Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't think it should be moved inside the if (data.isNonNull()) check. We must increment numStackTraceElements regardless of the if (data.isNonNull()) check, but must not increment if we are trimming the current stacktrace.

@@ -167,8 +170,12 @@ private static int visitFrame(JfrNativeEventWriterData data, CodeInfo codeInfo,
int numStackTraceElements = 0;
FRAME_INFO_CURSOR.initialize(codeInfo, ip, false);
while (FRAME_INFO_CURSOR.advance()) {
FrameInfoQueryResult frame = FRAME_INFO_CURSOR.get();
if (VMInspectionOptions.JfrTrimInternalStackTraces.getValue() &&
Copy link
Member

@christianhaeubl christianhaeubl Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering everything related to com.oracle.svm.* seems too generic and may omit relevant stack frames (e.g., cases where Native Image actually implements some runtime functionality in Java). I think that this logic needs to be more selective.

I also think that you should only filter frames at the very top of the stack trace. Your current approach may omit frames in the middle of the stack, which will result in broken/impossible stack traces.

Doing the filtering during the serialization seems problematic as well. Users that configure a small stack depth (e.g., -XX:FlightRecorderOptions=stackdepth=5), may end up with a lot of empty stack traces.

Besides that, I think that filtering is only reasonably easy for JFR events but difficult for JFR sampling:

  • JFR events: here, we can probably just skip all frames until we encounter some well-known top-level frame of our JFR infrastructure (or some well-known VM-internal class).
  • sampling: this is a lot more complex, especially with the async sampler, as we may see completely arbitrary stack frames. How would we decide if we are in VM internal code without walking a lot (maybe even all?) of the stack? E.g., we might be in VM-internal code but there can be some JDK code at the top of the stack.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback. How do you feel about using skipCount to trim only internal JFR frames on SubstrateVM level events? This would be just the top ~3 frames and not affect CPU profiler samples or JDK events. This would avoid filtering out any middle frames, and also avoid unnecessarily shortening stacktraces
due to filtering at serialization time. Limiting the trimming to only JFR frames while keeping other internal frames would also make it easier to maintain. JDK events use a hardcoded skipcount too.

This wouldn't exactly match what users see in OpenJDK, but would at least get rid of the top few redundant internal JFR frames.

Alternatively, like you mentioned, we could filter from the top of the stack down until we hit a known method. And only do this for VM-level events while walking the stack. This involves more steps since it must use FrameInfoCursor while stack walking, but it's nice that it avoids needing to manually set a skipcount.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to give the hardcoded skipcount a try as it is the simplest approach. If that turns out to be a maintenance burden, we can always implement a more complex approach later on.

We will need fairly good tests for the stack traces of the different JFR events though. Otherwise, it is probably very easy to introduce a regression.

Copy link
Collaborator Author

@roberttoyonaga roberttoyonaga Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented the skipcount approach and modified the tests for all the events affected to ensure we don't end up skipping too many frames. I left the JfrTrimInternalStackTraces hosted option set to true by default.

@roberttoyonaga
Copy link
Collaborator Author

Note: truffle JDK21 GHA failing due to failure downloading Eclipse. Seems unrelated to this PR.

@christianhaeubl christianhaeubl changed the title Trim internal JFR stacktraces to match OpenJDK [GR-61151] Trim internal JFR stacktraces to match OpenJDK Jan 10, 2025

/**
* This file contains the VM-level events that Native Image supports on all JDK versions. The event
* IDs depend on the JDK version (see metadata.xml file) and are computed at image build time.
*/
public final class JfrEvent {
public static final JfrEvent ThreadStart = create("jdk.ThreadStart");
private static final int defaultInternalSkipCount = VMInspectionOptions.JfrTrimInternalStackTraces.getValue() ? 3 : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we apply the skip count to all JFR events or are there any events that need to be excluded explicitly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fine to apply the skipcount to all events defined in SubstrateVM. The periodic ones and others that don't contain stacktraces would just ignore the skipcount.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then let's use the skipCount for all events. Please remove the static field defaultInternalSkipCount and move its logic directly into getSkipCount().

@@ -79,6 +80,11 @@ private static void validateEvents(List<RecordedEvent> events) {
boolean parkUnparkFound = false;

for (RecordedEvent event : events) {
List<RecordedFrame> frames = event.getStackTrace().getFrames();
assertTrue(frames.size() > 0);
System.out.println(frames.getFirst().getMethod().getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this on purpose or is it a leftover from debugging?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should have been replaced with checkStackTraceTrimming . Thanks for spotting that.

@@ -54,6 +54,9 @@ public void test() throws Throwable {

private static void validateEvents(List<RecordedEvent> events) {
assertTrue(events.size() > 0);
for (RecordedEvent event : events) {
checkStackTraceTrimming(event, "emit");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we trim the "emit" methods as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to trim the "emit" methods as well, but when I increment the skipcount, it seems to trim 2 frames. Maybe this is due to inlining? I'm not entirely sure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I decided it was better to trim slightly less and keep more information in the stacktrace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to trim 2 frames

Interesting - it seems that the skipCount logic is broken at the moment as it is applied on the level of physical stack frames (instead of virtual frames). We need to fix that, otherwise, skipCount will be pretty much unpredictable.

We would probably need to do something like the following:

  • remove the skipCount from JfrStackWalker (at that point in time, we only know about the physical stack frames)
  • modify JfrStackWalker so that it stores the requested skip count at the very beginning of the sampler data (i.e., before any recorded IPs)
  • when serializing the raw IPs into JFR stack traces, we need to apply the skip count on the level of Java frames

Please let me know if you can look into that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can look into this. I'll add that change to this pull request (unless you think it's cleaner separately).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also probably need to delay when we do truncation to the time of serialization. Otherwise we may truncate unnecessarily.

Copy link
Member

@christianhaeubl christianhaeubl Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I think that 2 different kinds of truncation are needed for the bottom-most frames:

  • JfrStackWalker: truncate so that we have at most skipCount + getMaxDepth() physical frames (each physical frame corresponds to at least 1 virtual frame)
  • during serialization: apply skipCount to remove the top frames and truncate the bottom-most frames so that we have at most getMaxDepth() virtual frames.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I see. I'll work on adding this. Thanks!

@roberttoyonaga roberttoyonaga force-pushed the trim-stacktraces branch 2 times, most recently from 25f2411 to 23de2fb Compare January 15, 2025 21:44
@roberttoyonaga
Copy link
Collaborator Author

Hi @christianhaeubl, I've implemented the fix for the skipcount bug we talked about. Please have another look when you have time. Thank you!

@roberttoyonaga roberttoyonaga force-pushed the trim-stacktraces branch 2 times, most recently from ec138f2 to 1496d78 Compare January 16, 2025 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
native-image native-image-jfr OCA Verified All contributors have signed the Oracle Contributor Agreement. redhat-interest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants