Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust symbols are inconveniently shortened #5439

Open
bobrik opened this issue Jan 19, 2025 · 2 comments
Open

Rust symbols are inconveniently shortened #5439

bobrik opened this issue Jan 19, 2025 · 2 comments

Comments

@bobrik
Copy link

bobrik commented Jan 19, 2025

Here's how parca resolves a symbol

$ git diff
diff --git a/pkg/symbol/elfutils/debuginfofile.go b/pkg/symbol/elfutils/debuginfofile.go
index 4a9ab2cf2..1c5a81aa9 100644
--- a/pkg/symbol/elfutils/debuginfofile.go
+++ b/pkg/symbol/elfutils/debuginfofile.go
@@ -102,6 +102,7 @@ func (f *debugInfoFile) SourceLines(addr uint64) ([]profile.LocationLine, error)
        }

        file, line := findLineInfo(f.lineEntries[cu.Offset], tr.Ranges)
+       fmt.Printf("addr = 0x%x, name = %s, file = %s, line = %d\n", addr, name, file, line)
        lines = append(lines, profile.LocationLine{
                Line: line,
                Function: f.demangler.Demangle(&pb.Function{
addr = 0x17e976, name = small_slot_len, file = ?, line = 0

Compare this to addr2line from elfutils:

$ eu-addr2line -e ./target/debug/weird-unwind -S 0x17e976
_ZN14regex_automata4util8captures14GroupInfoInner14small_slot_len17h3554d4e8237ec643E+0x16
/home/ivan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/regex-automata-0.4.9/src/util/captures.rs:2323:9

You couldn't even tell it was something about regexps, but it is:

$ echo _ZN14regex_automata4util8captures14GroupInfoInner14small_slot_len17h3554d4e8237ec643E+0x16 | c++filt
regex_automata::util::captures::GroupInfoInner::small_slot_len::h3554d4e8237ec643+0x16

Compare the flamegraph view in Parca (new and build are very opaque!):

Image

To what perf manages to do:

Image

Interesting to note that symbols come demangled before demangler runs, for example:

addr = 0x3659e6, name = min<core::iter::adapters::map::Map<core::slice::iter::Iter<regex_syntax::hir::literal::Literal>, regex_syntax::hir::literal::{impl#4}::min_literal_len::{closure_env#0}>>, file = /rustc/48a426eca9df23b24b3559e545cf88dee61d4de9/library/core/src/iter/traits/iterator.rs, line = 3143
@bobrik
Copy link
Author

bobrik commented Jan 19, 2025

Looking through what the entry dwarf provides:

(dlv) p tr.Entry
github.com/go-delve/delve/pkg/dwarf/godwarf.Entry(github.com/go-delve/delve/pkg/dwarf/godwarf.compositeEntry) [
	*{
		Offset: 1051394,
		Tag: TagSubprogram (46),
		Children: true,
		Field: []debug/dwarf.Field len: 4, cap: 4, [
			(*"debug/dwarf.Field")(0x140014ce680),
			(*"debug/dwarf.Field")(0x140014ce6a0),
			(*"debug/dwarf.Field")(0x140014ce6c0),
			(*"debug/dwarf.Field")(0x140014ce6e0),
		],},
	*{
		Offset: 980869,
		Tag: TagSubprogram (46),
		Children: true,
		Field: []debug/dwarf.Field len: 7, cap: 7, [
			(*"debug/dwarf.Field")(0x140014b5340),
			(*"debug/dwarf.Field")(0x140014b5360),
			(*"debug/dwarf.Field")(0x140014b5380),
			(*"debug/dwarf.Field")(0x140014b53a0),
			(*"debug/dwarf.Field")(0x140014b53c0),
			(*"debug/dwarf.Field")(0x140014b53e0),
			(*"debug/dwarf.Field")(0x140014b5400),
		],},
]

The first batch:

(dlv) print *(*"debug/dwarf.Field")0x140014ce680
Command failed: 1:24: expected 'EOF', found 0x140014ce680
(dlv) print *(*"debug/dwarf.Field")(0x140014ce680)
debug/dwarf.Field {
	Attr: AttrLowpc (17),
	Val: interface {}(uint64) 1567072,
	Class: ClassAddress (1),}
(dlv) print *(*"debug/dwarf.Field")(0x140014ce6a0)
debug/dwarf.Field {
	Attr: AttrHighpc (18),
	Val: interface {}(int64) 36,
	Class: ClassConstant (3),}
(dlv) print *(*"debug/dwarf.Field")(0x140014ce6c0)
debug/dwarf.Field {
	Attr: AttrFrameBase (64),
	Val: interface {}([]uint8) [87],
	Class: ClassExprLoc (4),}
(dlv) print *(*"debug/dwarf.Field")(0x140014ce6e0)
debug/dwarf.Field {
	Attr: AttrSpecification (71),
	Val: interface {}(debug/dwarf.Offset) 980869,
	Class: ClassReference (10),}

The second batch:

(dlv) print *(*"debug/dwarf.Field")(0x140014b5340)
debug/dwarf.Field {
	Attr: AttrLinkageName (110),
	Val: interface {}(string) "_ZN14regex_automata4util8captures14GroupInfoInner14small_slot_le...+21 more",
	Class: ClassString (12),}
(dlv) print *(*"debug/dwarf.Field")(0x140014b5360)
debug/dwarf.Field {
	Attr: AttrName (3),
	Val: interface {}(string) "small_slot_len",
	Class: ClassString (12),}
(dlv) print *(*"debug/dwarf.Field")(0x140014b5380)
debug/dwarf.Field {
	Attr: AttrDeclFile (58),
	Val: interface {}(int64) 47,
	Class: ClassConstant (3),}
(dlv) print *(*"debug/dwarf.Field")(0x140014b53a0)
debug/dwarf.Field {
	Attr: AttrDeclLine (59),
	Val: interface {}(int64) 2316,
	Class: ClassConstant (3),}
(dlv) print *(*"debug/dwarf.Field")(0x140014b53c0)
debug/dwarf.Field {
	Attr: AttrType (73),
	Val: interface {}(debug/dwarf.Offset) 979443,
	Class: ClassReference (10),}
(dlv) print *(*"debug/dwarf.Field")(0x140014b53e0)
debug/dwarf.Field {
	Attr: AttrDeclaration (60),
	Val: interface {}(bool) true,
	Class: ClassFlag (5),}
(dlv) print *(*"debug/dwarf.Field")(0x140014b5400)
debug/dwarf.Field {
	Attr: AttrExternal (63),
	Val: interface {}(bool) true,
	Class: ClassFlag (5),}

It seems that AttrLinkageName is what we want here rather than AttrName.

Looking at elfutls code, that's exactly what's being used:

 302 static const char *
 303 get_diename (Dwarf_Die *die)
 304 {
 305   Dwarf_Attribute attr;
 306   const char *name;
 307 
 308   name = dwarf_formstring (dwarf_attr_integrate (die, DW_AT_MIPS_linkage_name,
 309                                                  &attr)
 310                            ?: dwarf_attr_integrate (die, DW_AT_linkage_name,
 311                                                     &attr));
 312 
 313   if (name == NULL)
 314     name = dwarf_diename (die) ?: "??";
 315 
 316   return name;
 317 }

Swapping this out in Parca itself gives me the following flamegraph:

Image

The problem now is that symbols aren't demangled properly. Taking a random one:

addr = 0x117da0, name = _ZN3std2rt10lang_start28_$u7b$$u7b$closure$u7d$$u7d$17h16960a2bf2e077d9E, file = /rustc/48a426eca9df23b24b3559e545cf88dee61d4de9/library/std/src/rt.rs, line = 195

It matches eu-addr2line:

$ eu-addr2line -e ./target/debug/weird-unwind -S 0x117da0
_ZN3std2rt10lang_start28_$u7b$$u7b$closure$u7d$$u7d$17h16960a2bf2e077d9E+0x10
/rustc/48a426eca9df23b24b3559e545cf88dee61d4de9/library/std/src/rt.rs:195:18

And c++filt can demangle it with no issues:

$ echo '_ZN3std2rt10lang_start28_$u7b$$u7b$closure$u7d$$u7d$17h16960a2bf2e077d9E' | c++filt
std::rt::lang_start::{{closure}}::h16960a2bf2e077d9

It's probably worth a separate issue to figure this part out.

I'm happy to open a PR to start using AttrLinkageName if available and falling back to AttrName. Let me know what you think.

@bobrik
Copy link
Author

bobrik commented Jan 19, 2025

It's probably worth a separate issue to figure this part out.

It's because we feed Name instead of SystemName into demangler and it think that there's nothing to do:

With that fixed, I finally have my expected flamegraph:

Image

I'm also happy to open a PR to fix that bit as well if it makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant