Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leak Sanitizer Segfaulting in CI #22215

Open
wernerpe opened this issue Nov 19, 2024 · 4 comments · May be fixed by #22232
Open

Leak Sanitizer Segfaulting in CI #22215

wernerpe opened this issue Nov 19, 2024 · 4 comments · May be fixed by #22232
Assignees
Labels
component: build system Bazel, CMake, dependencies, memory checkers, linters priority: medium type: bug

Comments

@wernerpe
Copy link
Contributor

wernerpe commented Nov 19, 2024

What happened?

We found confusing behavior when looking into one of the tests run in CI that ran the leak sanitation. We found that in the planning directory, the visibility graph test and the iris zo test (#22168 ) would fail sporadically in CI when running 'linux-jammy-clang-bazel-experimental-leak-sanitizer'. To reproduce the error locally we ran something like

bazel test --runs_per_test=10 --config=clang --compilation_mode=dbg --config=lsan //planning:visibility_graph_test

on ubuntu 22.04 and found typically 2-3/10 runs would produce the segfault. From what we could tell, the segfault gets tripped before entering the test body and only when more than a single thread was requested.

The commit sha i have added below points to a commit on my fork of drake (from which I have opened the pr #22168 ).

Version

34437bc

What operating system are you using?

Ubuntu 22.04

What installation option are you using?

No response

Relevant log output

No response

@rpoyner-tri rpoyner-tri added the component: build system Bazel, CMake, dependencies, memory checkers, linters label Nov 19, 2024
@calderpg-tri
Copy link
Contributor

Some additional information:

  • segfaults are reproducible both with and without OpenMP enabled in build
  • segfaults are reproducible with only one thread specified (e.g. DRAKE_NUM_THREADS=1)
  • the only segfault backtraces @sammy-tri and I could reliably produce were deep in LSAN startup code, well before any log messages were printed (which is well before anything parallel gets run in either test)

@calderpg-tri
Copy link
Contributor

I was able to reproduce the segfault on 24.04 using clang-15. Switching to clang-18 on 24.04, I was unable to reproduce the segfault in 1000 runs of the test. I am inclined to say this is a LSAN bug.

@jwnimmer-tri
Copy link
Collaborator

Locally, using coredumpctl to capture the core, I see this in the backtrace of the segfault:

(lldb) bt
* thread #1, name = 'visibility_grap', stop reason = signal SIGSEGV
  * frame #0: 0x0000636a0bf6e781 visibility_graph_test`__sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long) + 33
    frame #1: 0x0000636a0bf7021a visibility_graph_test`__sanitizer::MmapNamed(void*, unsigned long, int, int, char const*) + 58
    frame #2: 0x0000636a0bf7a038 visibility_graph_test`__sanitizer::ReservedAddressRange::Init(unsigned long, char const*, unsigned long) + 40
    frame #3: 0x0000636a0bf8c66e visibility_graph_test`__sanitizer::SizeClassAllocator64<__lsan::AP64<__sanitizer::LocalAddressSpaceView> >::Init(int, unsigned long) + 126
    frame #4: 0x0000636a0bf8ae1f visibility_graph_test`__lsan::InitializeAllocator() + 63
    frame #5: 0x0000636a0bf8acae visibility_graph_test`__lsan_init + 302

@jwnimmer-tri jwnimmer-tri self-assigned this Nov 22, 2024
@jwnimmer-tri
Copy link
Collaborator

Taking I hint from https://iree.dev/developers/debugging/sanitizers/#tsan-threadsanitizer, I think it's because our kernel's ASLR mmap_rnd_bits are too large for lsan to handle:

jwnimmer@call-cps:~$ sudo sysctl vm.mmap_rnd_bits
vm.mmap_rnd_bits = 32

I have a fix that seems to pass repeated testing, I'll open a PR once I'm satisfied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: build system Bazel, CMake, dependencies, memory checkers, linters priority: medium type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants