Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: MutexProfile has multiple records with equal stack #67548

Closed
korniltsev opened this issue May 21, 2024 · 12 comments
Closed

runtime: MutexProfile has multiple records with equal stack #67548

korniltsev opened this issue May 21, 2024 · 12 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FixPending Issues that have a fix which has not yet been reviewed or submitted. NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@korniltsev
Copy link
Contributor

Go version

go1.23-f4494522dc

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/korniltsev/.cache/go-build'
GOENV='/home/korniltsev/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/korniltsev/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/korniltsev/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/korniltsev/github/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/korniltsev/github/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='devel go1.23-f4494522dc Mon May 13 16:42:35 2024 +0000'
GODEBUG=''
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/korniltsev/github/go/src/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1551311124=/tmp/go-build -gno-record-gcc-switches'

What did you do?

I run this program on gotip https://go.dev/play/p/YD8N3UVACna?v=gotip and I get different results on gotip and 1.22

I think the reason is before the change this line of code

b := stkbucket(which, 0, stk, true)

used stack from callers / gcallers which does skip logical frames and performs inline unwinding and looks like this

[
   0x48f931, //  sync.(*Mutex).Unlock
   0x48f918,  // main.main.func1
   0x46ba81,  // runtime.goexit
]

And after the change the same line uses stack from fpTracebackPCs
which looks like this:

[
   0x4,      // skip value
   0x467405, // sync.event 
   0x44ad65, // runtime.semrelease1
   0x468533, // sync.runtime_Semrelease 
   0x476b5b, // sync.(*Mutex).unlockSlow << this function has a branch, so value here depends on Mutex state
   0x48f931, // main.main.func1
   0x46ba81, // runtime.goexit
]

Because of that, runtime.stkbucket allocates two bucket s for the same logical stack and therefore later we get two BlockProfileRecord with equal stack.

I am not exactly sure this is a bug and how severe it is. It looks like it may lead to memory usage increase.

What did you see happen?

I see multiple BlockProfileRecord with equal stack.

=========================
4905d1  sync.(*Mutex).Unlock
4905b8  main.main.func1
46b241  runtime.goexit
=========================
4905d1  sync.(*Mutex).Unlock
4905b8  main.main.func1
46b241  runtime.goexit

What did you expect to see?

If I run on 1.22 I get only one BlockProfileRecord

=========================
482751  sync.(*Mutex).Unlock
482738  main.main.func1
4622c1  runtime.goexit
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label May 21, 2024
@korniltsev
Copy link
Contributor Author

cc @nsrip-dd

@nsrip-dd
Copy link
Contributor

Thanks for raising this, I hadn't considered that when sending the CL. Technically these are different stacks, as you observe, it's just that the choice of how much to skip makes them look the same. There is a similar issue with the heap profile, where map internals are elided from the end of call stacks and we can get multiple entries with the same stack in the pprof.

A few things come to mind for how to address this:

  • We could deduplicate/merge the records before returning them.
  • We could handle skipping/inlining at sample time again. This would somewhat defeat the purpose of switching to frame pointer unwinding, where we ideally would keep the recording overhead low and defer the more expensive stack handling logic to reporting time, after we've already aggregated the data.
  • We could skip less, maybe even just up to mutexevent where the sample is recorded, and these would no longer be duplicates. That might warrant a proposal, though.

@korniltsev
Copy link
Contributor Author

I'll just braindump some more ideas:

  • We can change (m *Mutex) unlockSlow implementation to call runtime_Semrelease only once, unconditionally as the last expression before exit. This way return address on the stack will be the same for both starving and non starving mutexes. Not sure about other cases (rwmutex, waitgroup)
  • We can mark (m *Mutex) unlockSlow as noinline and teach saveblockevent to skip until specific function instead of number frames

@mknyszek mknyszek added this to the Backlog milestone May 22, 2024
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 23, 2024
@rsc
Copy link
Contributor

rsc commented Jun 27, 2024

The discussion in grafana/pyroscope-go#103 makes this look like a Go 1.23 regression to me.

@rsc rsc modified the milestones: Backlog, Go1.23 Jun 27, 2024
@felixge
Copy link
Contributor

felixge commented Jul 2, 2024

@rsc ack. We'll try to submit a patch that restores the old behavior by deduplicating the returned records. This will make runtime.MutexProfile() a little more expensive, but not cause any user code to potentially break. @nsrip-dd will work on this.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/595966 mentions this issue: runtime: aggregate records with identical stacks in MutexProfile

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. FixPending Issues that have a fix which has not yet been reviewed or submitted. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jul 3, 2024
@thanm
Copy link
Contributor

thanm commented Jul 3, 2024

Any updates here?

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Jul 3, 2024

Yes, I've sent https://go-review.googlesource.com/c/go/+/595966 implementing my first suggested fix, to merge records with the same call stack before returning them from MutexProfile. The CL needs review, though it looks like it hasn't been automatically assigned reviewers as of this writing.

@cherrymui
Copy link
Member

Thanks @nsrip-dd ! I can review the CL.

@rhysh
Copy link
Contributor

rhysh commented Jul 3, 2024

I gave it a non-maintainer pass; I'll leave it to @cherrymui .

@cherrymui
Copy link
Member

cherrymui commented Jul 15, 2024

CL 595966 is reverted. Reopen.

@cherrymui cherrymui reopened this Jul 15, 2024
@github-project-automation github-project-automation bot moved this from Done to In Progress in Go Compiler / Runtime Jul 15, 2024
@dmitshur dmitshur moved this from Done to In Progress in Release Blockers Jul 15, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/598515 mentions this issue: runtime: avoid multiple records with identical stacks from MutexProfile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FixPending Issues that have a fix which has not yet been reviewed or submitted. NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
Status: Done
Development

No branches or pull requests