Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

godeltaprof tests are failing on gotip #103

Closed
korniltsev opened this issue May 14, 2024 · 6 comments · Fixed by #108
Closed

godeltaprof tests are failing on gotip #103

korniltsev opened this issue May 14, 2024 · 6 comments · Fixed by #108

Comments

@korniltsev
Copy link
Collaborator

Failing tests:
https://github.com/grafana/pyroscope-go/actions/runs/9072542438/job/24932183708

This bot PR #102 suggests this may be caused by this upstream change https://go-review.googlesource.com/c/go/+/533258

@korniltsev
Copy link
Collaborator Author

korniltsev commented May 21, 2024

this may be caused by this upstream change https://go-review.googlesource.com/c/go/+/533258

Indeed, this commit is the reason.

The actual reason is that before the change this line of code https://github.com/golang/go/blob/22344e11f27d9667e7bbb6209df59e9a9e976d91/src/runtime/mprof.go#L776
used stack from callers / gcallers which does skip logical frames and performs inline unwinding and looks like this

[
   0x48f931, //  sync.(*Mutex).Unlock
   0x48f918,  // main.main.func1
   0x46ba81,  // runtime.goexit
]

And after the change the same line uses stack from fpTracebackPCs
which looks like this:

[
   0x4,      // skip value
   0x467405, // sync.event 
   0x44ad65, // runtime.semrelease1
   0x468533, // sync.runtime_Semrelease 
   0x476b5b, // sync.(*Mutex).unlockSlow << this function has a branch, so value here depends on Mutex state
   0x48f931, // main.main.func1
   0x46ba81, // runtime.goexit
]

Because of that, runtime.stkbucket allocates two bucket s for the same logical stack and therefore later we get two BlockProfileRecord with equal stack (and godeltaprof assumed all BlockProfileRecords are unique)

Here is a short program to reproduce:

https://go.dev/play/p/YD8N3UVACna?v=gotip

gotip:

=========================
443e01  runtime._LostContendedRuntimeLock
=========================
48f931  sync.(*Mutex).Unlock
48f918  main.main.func1
46ba81  runtime.goexit
=========================
48f931  sync.(*Mutex).Unlock
48f918  main.main.func1
46ba81  runtime.goexit

1.22:

=========================
4401a1  runtime._LostContendedRuntimeLock
=========================
48f931  sync.(*Mutex).Unlock
48f918  main.main.func1
46ba81  runtime.goexit

@korniltsev
Copy link
Collaborator Author

This is not necessary a runtime bug, but I've submitted a golang issue anyway just to make sure they are aware of this (implicit?) change. golang/go#67548

I think it make sense to defend godeltaprof against this change even if the change is fixed upstream.

@korniltsev
Copy link
Collaborator Author

There is a similar issue with the heap profile, where map internals are elided from the end of call stacks and we can get multiple entries with the same stack in the pprof.

We should have a test for this case if this is the case

korniltsev added a commit that referenced this issue Jun 4, 2024
Fixes #103

This PR fixes an issue when a heap/mutex profile has a record with equal stack (by accumulating values for duplicate stacks before delta computation).
The issue was found in tests for mutex(golang started to produce two profile records for the same mutex ( see the linked issue for more info))
I'm not sure if it is possible to trigger the issue for heap.

The PR turned up big because of refactored test helpers for reuse. I can split PR into 2 or 3 smaller ones if needed.

This PR also drops support for go16, go17 as I wanted to use generics in prof/map.go

goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope-go/godeltaprof/compat
                                      │   old.txt   │               new.txt                │
                                      │   sec/op    │    sec/op      vs base               │
HeapCompression-8                       691.1µ ± 0%    729.8µ ± 65%   +5.59% (p=0.002 n=6)
MutexCompression/ScalerMutexProfile-8   640.6µ ± 1%    693.8µ ±  9%   +8.30% (p=0.002 n=6)
MutexCompression/ScalerBlockProfile-8   643.2µ ± 1%    690.3µ ±  1%   +7.33% (p=0.002 n=6)
HeapDelta-8                             76.26µ ± 1%   113.92µ ±  2%  +49.38% (p=0.002 n=6)
MutexDelta/ScalerMutexProfile-8         48.98µ ± 1%    81.11µ ±  2%  +65.59% (p=0.002 n=6)
MutexDelta/ScalerBlockProfile-8         49.03µ ± 1%    81.14µ ±  1%  +65.49% (p=0.002 n=6)
geomean                                 193.3µ         253.0µ        +30.87%

                                      │   old.txt    │               new.txt               │
                                      │     B/op     │     B/op      vs base               │
HeapCompression-8                       974.1Ki ± 0%   973.9Ki ± 0%   -0.03% (p=0.002 n=6)
MutexCompression/ScalerMutexProfile-8   974.6Ki ± 0%   974.3Ki ± 0%   -0.02% (p=0.002 n=6)
MutexCompression/ScalerBlockProfile-8   974.6Ki ± 0%   974.3Ki ± 0%   -0.03% (p=0.002 n=6)
HeapDelta-8                              333.00 ± 0%     53.00 ± 0%  -84.08% (p=0.002 n=6)
MutexDelta/ScalerMutexProfile-8          312.00 ± 0%     30.00 ± 0%  -90.38% (p=0.002 n=6)
MutexDelta/ScalerBlockProfile-8          312.00 ± 0%     30.00 ± 0%  -90.38% (p=0.002 n=6)
geomean                                 17.42Ki        5.874Ki       -66.28%

                                      │  old.txt   │              new.txt              │
                                      │ allocs/op  │ allocs/op   vs base               │
HeapCompression-8                       788.0 ± 0%   787.0 ± 0%   -0.13% (p=0.002 n=6)
MutexCompression/ScalerMutexProfile-8   788.0 ± 0%   787.0 ± 0%   -0.13% (p=0.002 n=6)
MutexCompression/ScalerBlockProfile-8   788.0 ± 0%   787.0 ± 0%   -0.13% (p=0.002 n=6)
HeapDelta-8                             2.000 ± 0%   1.000 ± 0%  -50.00% (p=0.002 n=6)
MutexDelta/ScalerMutexProfile-8         2.000 ± 0%   1.000 ± 0%  -50.00% (p=0.002 n=6)
MutexDelta/ScalerBlockProfile-8         2.000 ± 0%   1.000 ± 0%  -50.00% (p=0.002 n=6)
geomean                                 39.70        28.05       -29.33%

So it looks like the cost of deduplication is about 5% of overall time(including compression), which is not that bad.
@stefanb
Copy link

stefanb commented Jun 26, 2024

Probably a release should be tagged, so that it can be referenced from dependants (including grafana) to be buildable with upcoming Go 1.23

See build attempt in homebrew:

https://github.com/Homebrew/homebrew-core/actions/runs/9617673150/job/26535459459#step:3:10143

  go: downloading github.com/grafana/pyroscope-go/godeltaprof v0.1.6
  go: downloading github.com/grafana/authlib v0.0.0-20240319083410-9d4a6e3861e5
  # github.com/grafana/pyroscope-go/godeltaprof/internal/pprof
  /Users/brew/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/grafana/pyroscope-go/[email protected]/internal/pprof/delta_mutex.go:30:20: undefined: runtime_cyclesPerSecond
  /Users/brew/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/grafana/pyroscope-go/[email protected]/internal/pprof/proto.go:320:8: undefined: runtime_expandFinalInlineFrame

@nsrip-dd
Copy link

nsrip-dd commented Jul 3, 2024

👋 I'm working on an upstream fix for this. What I have so far is here: https://go-review.googlesource.com/c/go/+/595966. Is it alright if we use the reproducer you created as a regression test case? If so, I can of course add any attribution you think is appropriate to the CL :)

@korniltsev
Copy link
Collaborator Author

sure, feel free to use the reproducer if it suits your needs

gopherbot pushed a commit to golang/go that referenced this issue Jul 9, 2024
When using frame pointer unwinding, we defer frame skipping and inline
expansion for call stacks until profile reporting time. We can end up
with records which have different stacks if no frames are skipped, but
identical stacks once skipping is taken into account. Returning multiple
records with the same stack (but different values) has broken programs
which rely on the records already being fully aggregated by call stack
when returned from runtime.MutexProfile. This CL addresses the problem
by handling skipping at recording time. We do full inline expansion to
correctly skip the desired number of frames when recording the call
stack, and then handle the rest of inline expansion when reporting the
profile.

The regression test in this CL is adapted from the reproducer in
grafana/pyroscope-go#103, authored by Tolya
Korniltsev.

Fixes #67548

Co-Authored-By: Tolya Korniltsev <[email protected]>
Change-Id: I6a42ce612377f235b2c8c0cec9ba8e9331224b84
Reviewed-on: https://go-review.googlesource.com/c/go/+/595966
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Carlos Amedee <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
Auto-Submit: Carlos Amedee <[email protected]>
Reviewed-by: Felix Geisendörfer <[email protected]>
gopherbot pushed a commit to golang/go that referenced this issue Jul 17, 2024
When using frame pointer unwinding, we defer frame skipping and inline
expansion for call stacks until profile reporting time. We can end up
with records which have different stacks if no frames are skipped, but
identical stacks once skipping is taken into account. Returning multiple
records with the same stack (but different values) has broken programs
which rely on the records already being fully aggregated by call stack
when returned from runtime.MutexProfile. This CL addresses the problem
by handling skipping at recording time. We do full inline expansion to
correctly skip the desired number of frames when recording the call
stack, and then handle the rest of inline expansion when reporting the
profile.

The regression test in this CL is adapted from the reproducer in
grafana/pyroscope-go#103, authored by Tolya
Korniltsev.

Fixes #67548

This reapplies CL 595966.
The original version of this CL introduced a bounds error in
MutexProfile and failed to correctly expand inlined frames from that
call. This CL applies the original CL, fixing the bounds error and
adding a test for the output of MutexProfile to ensure the frames are
expanded properly.

Change-Id: I5ef8aafb9f88152a704034065c0742ba767c4dbb
Reviewed-on: https://go-review.googlesource.com/c/go/+/598515
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Carlos Amedee <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants