-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid allocating GPU memory out of RMM managed pool in test #9985
Conversation
Signed-off-by: Ferdinand Xu <[email protected]>
build |
LGTM |
do we know why this just started failing? Seems like this should have been an issue for a while. Were we not using the ARENA allocator before? Or maybe we aren't explicitly setting it and it was always on machine with newer cuda? |
Yes, I think so. To reproduce this, I hardcoded the memory pool to arena. |
build |
do we need this to be targeted to 23.12, or as this is the test only we just ignore it in 23.12? |
My bad. Should target on 23.12 and merged back to 24.02. Let me re-target this. |
build |
This closes #9982.
The root cause is that within our test, we create a GPU column vector via Spark-Rapids-JNI API
ColumnVector.fromBytes
before RMM initialization. And we close it within Spark GPU session with RMM initialized. So RMM mistakenly deallocated that column vector via its memory address not visible to RMM (allocated before RMM initialized).