proposal: runtime/pprof: add data-type profiling #69699

florianl · 2024-09-28T15:24:39Z

Proposal Details

With field reordering and padding of structs, static analysis can help to improve memory layouts of Go structs. This can lead to a more efficient way to access struct fields, as the fields within the struct are aligned to some degree. Combined with dead code analysis, unused fields in structs can be identified by static analysis and help to reduce the size of structs.

This proposal tries to introduce the ideas from Data-type profiling for perf to Go's pprof ecosystem to provide a Go native approach. Today it is already possible with perf on Unix systems to do data-type profiling, reorder structs accordingly and benefit from the performance improvements.

Introduce a new runtime/pprof Profile that tracks the number read/write accesses of fields within a Go struct.

The report of this new runtime/pprof Profile should enable users to identify often used fields within a struct, in order to reorder struct fields to improve memory efficiency of their application.

Example reporting of for a Go struct generated by the approach described in Data-type profiling for perf:

Annotate type: 'struct runtime.mspan' (654 samples)
Percent     Offset       Size  Field
 100.00          0        160  struct runtime.mspan {
   0.00          0          0      internal/runtime/sys.NotInHeap   _ {
   0.00          0          0          internal/runtime/sys.nih     _;
                                   };
   1.05          0          8      runtime.mspan*   next;
   0.00          8          8      runtime.mspan*   prev;
   0.23         16          8      runtime.mSpanList*       list;
  41.18         24          8      uintptr  startAddr;
   2.30         32          8      uintptr  npages;
   0.19         40          8      runtime.gclinkptr        manualFreeList;
   1.74         48          2      uint16   freeindex;
   1.57         50          2      uint16   nelems;
   0.23         52          2      uint16   freeIndexForScan;
   1.82         56          8      uint64   allocCache;
   1.56         64          8      runtime.gcBits*  allocBits;
   5.51         72          8      runtime.gcBits*  gcmarkBits;
   0.42         80          8      runtime.gcBits*  pinnerBits;
   1.54         88          4      uint32   sweepgen;
   4.58         92          4      uint32   divMul;
   2.70         96          2      uint16   allocCount;
  12.49         98          1      runtime.spanClass        spanclass;
   0.00         99          1      runtime.mSpanStateBox    state {
   0.00         99          1          internal/runtime/atomic.Uint8        s {
   0.00         99          0              internal/runtime/atomic.noCopy   noCopy;
   0.00         99          1              uint8    value;
                                       };
                                   };
   1.69        100          1      uint8    needzero;
   0.11        101          1      bool     isUserArenaChunk;
   0.23        102          2      uint16   allocCountBeforeCache;
  18.64        104          8      uintptr  elemsize;
   0.00        112          8      uintptr  limit;
   0.00        120          8      runtime.mutex    speciallock {
   0.00        120          0          runtime.lockRankStruct       lockRankStruct;
   0.00        120          8          uintptr      key;
                                   };
   0.22        128          8      runtime.special* specials;
   0.00        136         16      runtime.addrRange        userArenaChunkFree {
   0.00        136          8          runtime.offAddr      base {
   0.00        136          8              uintptr  a;
                                       };
   0.00        144          8          runtime.offAddr      limit {
   0.00        144          8              uintptr  a;
                                       };
                                   };
   0.00        152          8      internal/abi.Type*       largeType;
                               };

The above shown example reports the field access of the Go internal struct mspan while running the benchmarks in net/http with go version devel go1.24-eb6f2c24cd Sat Sep 28 01:07:09 2024 +0000 linux/amd64.

Alternative

Instead of introducing a new runtime/pprof Profile, a similar approach to go build -cover could be used. During build time access to fields in Go structs could be instrumented and a report should be generated when executing the resulting Go binary. The resulting report then can be used by go tool cover to report the number of times a field in a struct was accessed.

Question

I'm lacking Go runtime internal knowledge to provide a proof of concept with this proposal.

Should runtime internal Go structs be exposed as well with data type profiling?
Should the profiling of Go structs differentiate between publicly exposed fields and non-public internal fields?
Is it possible and safe to turn on/off data-type profiling during runtime?
Should the profile collect samples of field access, similar to the perf approach, or count and report exact numbers

The text was updated successfully, but these errors were encountered:

gabyhelp · 2024-09-28T15:24:50Z

Related Issues and Documentation

Go Wiki: Debugging performance issues in Go programs > Memory Profiler
runtime: sloppy struct field arrangement could be re-arranged for more compact structs and save RAM #42412 (closed)
proposal: runtime/pprof: add goroutine stack memory usage profile #66566
proposal: runtime/pprof: add new WithLabels* function that requires fewer allocations #33701
runtime/pprof,net/http/pprof: improve delta profiles efficiency and correctness #67942
proposal: runtime/pprof: add “heaptime” bytes*GCs memory profile #55900
runtime/pprof: heap/allocs profile with legacy format yields different results from the profile with proto format #25096 (closed)

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

prattmic · 2024-09-28T22:48:16Z

This is a very intriguing type of profile I’ve never heard of before.

Do you intend that this profile would work the same way as the linked perf profile type? That is, using a precise “memory access” (or memory load) hardware PMU metric.

Aside: I found the patch message to be the most straightforward and concise summary of how that profile works: https://lwn.net/Articles/954938/

Along those lines, do you know if the existing perf profile works on Go programs? I don’t see fundamental reasons it shouldn’t, but we may be missing some DWARF. So even if we don’t add a profile to runtime/pprof, fixing up problems with perf profiles may be doable.

prattmic · 2024-09-28T22:48:35Z

cc @golang/runtime

florianl · 2024-09-29T08:31:13Z

Do you intend that this profile would work the same way as the linked perf profile type? That is, using a precise “memory access” (or memory load) hardware PMU metric.

Implementing this new profile based on PMU metrics would benefit accuracy, I think. I'm missing Go runtime internal knowledge to tell whether there is an option implementing it without PMU metrics.

Along those lines, do you know if the existing perf profile works on Go programs?

I'm using perf whenever it is available and so far I didn't run into issues or did miss some information when profiling Go executables. The given example of struct runtime.mspan in the initial post of this proposal was generated by perf.
To my knowledge, perf is not available on every OS, e.g. I'm not aware of perf on windows. Also perf is often not deployed to production systems. Therefore, the Go ecosystem would benefit from insights of this new profile if it is integrated natively.

prattmic · 2024-09-30T15:29:45Z

That's great to hear that the perf tool seems to work well.

#36821 and #53286 cover providing PMU-based profiles in Go, though those are targeted at the more typical profiles (cycles, instructions, etc). Cross-platform support is discussed there as well. I believe the summary is that Linux of course has the perf events API, Windows has an API, though none of us are familiar with it, and macOS does not seem to have a (public) API at all.

florianl added the Proposal label Sep 28, 2024

gopherbot added this to the Proposal milestone Sep 28, 2024

ianlancetaylor added this to Proposals Sep 29, 2024

ianlancetaylor moved this to Incoming in Proposals Sep 29, 2024

florianl mentioned this issue Nov 4, 2024

WIP: Proposal: Support Timestamped Profiling Events open-telemetry/opentelemetry-proto#594

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: runtime/pprof: add data-type profiling #69699

proposal: runtime/pprof: add data-type profiling #69699

florianl commented Sep 28, 2024

gabyhelp commented Sep 28, 2024

prattmic commented Sep 28, 2024

prattmic commented Sep 28, 2024

florianl commented Sep 29, 2024

prattmic commented Sep 30, 2024

proposal: runtime/pprof: add data-type profiling #69699

proposal: runtime/pprof: add data-type profiling #69699

Comments

florianl commented Sep 28, 2024

Proposal Details

Proposal Details

Alternative

Question

gabyhelp commented Sep 28, 2024

prattmic commented Sep 28, 2024

prattmic commented Sep 28, 2024

florianl commented Sep 29, 2024

prattmic commented Sep 30, 2024