Regarding the usage of AMD (Navi 12 Radeon Pro V520) #2188
Replies: 1 comment
-
Hi @anujajyothiv. This seems like a PyTorch issue rather than a BoTorch one. I'd recommend seeking help in PyTorch community forums: https://discuss.pytorch.org/ |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi I am facing issues while trying to use this GPU.
I have tried the following:
pip install torch==1.11.0+rocm4.5.2 torchvision==0.12.0+rocm4.5.2 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/rocm4.5.2
Got error :
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Aborted (core dumped)
Then I tried
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7
in the code I mentioned the following:
os.environ["HIP_LAUNCH_BLOCKING"] = "1"
os.environ["TORCH_USE_HIP_DSA"] = "1"
os.environ["HIP_VISIBLE_DEVICES"] = "0"
os.environ["HSA_OVERRIDE_GFX_VERSION"] ="10.3.0"
and the model i used is large-v3 . I got
OutOfMemoryError: HIP out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacity of 7.86 GiB of which 0 bytes is free. Of the allocated memory 7.33 GiB is allocated by PyTorch, and 374.37 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_HIP_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)\Your%5C%5CYour) script attempted to allocate 26.00 MiB of memory on GPU 0.
GPU 0 has a total capacity of 7.86 GiB.
Currently, the GPU has 0 bytes of free memory available.
Out of the allocated memory, 7.33 GiB is already allocated by PyTorch.
Additionally, 374.37 MiB is reserved by PyTorch but is unallocated.
I tried with base and small model and got the below error.
:0:rocdevice.cpp :2692: 11020837680 us: [pid:41270 tid:0x7f949e9ff640] Callback: Queue 0x7f9387400000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
Aborted (core dumped)
print(torch.cuda.get_device_name(torch.cuda.current_device())) does show AMD Radeon Pro V520
Beta Was this translation helpful? Give feedback.
All reactions