-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero-copy between CUDA and XLA #6971
Comments
Ok, for moving a CUDA tensor containing a single value to XLA (case 1.2), I think I know what is happening and I think for this case it might be ok to go through CPU:
|
So I ran
|
I figured it out. The error above only exist in IFRT. But since we are using PJRT, we don't have such issue. I added a test for it. So now I'm getting another error: |
I used CUDA_VISIBLE_DEVICES=1 to constrain the device and got OOM:
|
Well actually, I realized that the above OOM was run on my V100 machine. So I ran the same script and code on my A100 machine and it ran fine. |
This issue will be used to track the work for zero-copy between CUDA and XLA.
Inspired by
I implemented a POC at #6970.
Current status:
Currently fails with error
with GPU, I can see the stacktrace:
with prints:
Will look into it.
cc: @ysiraichi @JackCaoG @miladm
The text was updated successfully, but these errors were encountered: