Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cannot catch RMM exceptions thrown across DSO boundaries #1652

Closed
wence- opened this issue Aug 20, 2024 · 4 comments · Fixed by #1654
Closed

[BUG] Cannot catch RMM exceptions thrown across DSO boundaries #1652

wence- opened this issue Aug 20, 2024 · 4 comments · Fixed by #1654
Assignees
Labels
2 - In Progress Currently a work in progress bug Something isn't working

Comments

@wence-
Copy link
Contributor

wence- commented Aug 20, 2024

Describe the bug

Since #1644, the default symbol visibility for RMM is hidden.

This is a good thing, however, the stick was applied a bit too aggressively.

In particular: exception classes need to have default visibility if we want to be able to catch an error thrown from DSO-A in DSO-B (otherwise the linker won't have disambiguated the names and the typeinfos will not compare equal).

I'm also worried that we probably need to mark all public non-template classes as RMM_EXPORT as well, since again, the two classes will not be the same (though will at least be ABI compatible) when passed between DSOs.

@wence- wence- added bug Something isn't working ? - Needs Triage Need team to review and classify labels Aug 20, 2024
@wence-
Copy link
Contributor Author

wence- commented Aug 20, 2024

See https://www.akkadia.org/drepper/dsohowto.pdf page 22 and https://gcc.gnu.org/wiki/Visibility for some discussion of the issue with exceptions

@jameslamb
Copy link
Member

Thanks for opening this.

First, we should clarify what's meany by "for RMM". #1644 only changed the visibility of symbols for the Cython bits of the rmm Python package. It shouldn't have in any way affected the use of RMM as a header-only library.

I've seen those resources you'd linked, but I was assuming that within the Python package, there wouldn't be cross-DSO throwing of anything not already marked for export via Cython's mechanisms.

Could you share a reproducible example of a case that used to work and is now broken?

Full disclosure, I did #1644 but this is way at the edge of my C++ understanding, so hopefully @robertmaynard will be able to help as well.

@wence-
Copy link
Contributor Author

wence- commented Aug 20, 2024

libcudf #includes rmm headers and, in some circumstances will throw an exception. This will now have a different typeinfo from the exception that is in the Cython DSO, since the two are compiled with different visibility and visibility-hidden wins out. So I think you won't be able to catch it. I don't right now have a 24.10 to test with though.

@robertmaynard
Copy link
Contributor

It looks like the converstation we had #1644 wasn't sufficient ( #1644 (comment) ) .

Types such as rmm::cuda_error that inherit from std::runtime_error have vtables and therefore need to be marked as RMM_EXPORT so that you can throw one in CythonDSO_A and catch it in CythonDSO_B.
Likewise the host_memory_resource, device_memory_resource and all derived types would also need to be marked as RMM_EXPORT if we expect them to cross DSO boundaries.

@robertmaynard robertmaynard self-assigned this Aug 20, 2024
@robertmaynard robertmaynard added 2 - In Progress Currently a work in progress and removed ? - Needs Triage Need team to review and classify labels Aug 20, 2024
robertmaynard pushed a commit to robertmaynard/rmm that referenced this issue Aug 20, 2024
rapids-bot bot pushed a commit that referenced this issue Aug 22, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in RMM Project Board Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants