-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory error: while allocating the memory #16
Comments
@jagadeesh09 Model parameters are not the only one that occupies the GPU memory, reduce batch_size to 16 or smaller would help. |
When I change the two lines of
|
I have already installed the latest pytorch since it had solved this tensor.cpu() problem I am using https://www.archlinux.org/packages/community/x86_64/python-pytorch-cuda/ So, what is actually still triggering this tensor.cpu() issue ? |
replace It worked for me. I think that |
I observed that the out-of-memory still occurs even I change batch size to 16. The first round was OK, but the second wasn't. I think we should delete the previous redundant unused model on GPU to free up memory before allocating the new one. |
I met a similiar issue, and solved it by setting pin_memory=false. |
Could you clarify the path for the pin_memory? How can I change it into false? |
Hi
I am working on Tesla K40, 12 GB GPU machine. I am facing this error constantly. If I calculate the required memory for VGG model with respect to the mentioned batch size in dataset.py , the required memory is far less than the available memory of GPU. What could be the reason and how to overcome this?
I am facing this after initializing the model and while calling cuda() also.
THCudaCheck FAIL file=/pytorch/torch/lib/THC/THCCachingHostAllocator.cpp line=258 error=2 : out of memory
Traceback (most recent call last):
File "finetune.py", line 272, in
fine_tuner.train(epoches = 20)
File "finetune.py", line 163, in train
self.train_epoch(optimizer)
File "finetune.py", line 182, in train_epoch
for batch, label in self.train_data_loader:
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 281, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 301, in process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 81, in worker_manager_loop
batch = pin_memory_batch(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 148, in pin_memory_batch
return [pin_memory_batch(sample) for sample in batch]
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 142, in pin_memory_batch
return batch.pin_memory()
File "/usr/local/lib/python2.7/dist-packages/torch/tensor.py", line 92, in pin_memory
return type(self)().set(storage.pin_memory()).view_as(self)
File "/usr/local/lib/python2.7/dist-packages/torch/storage.py", line 87, in pin_memory
return type(self)(self.size(), allocator=allocator).copy(self)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/THCCachingHostAllocator.cpp:258
The text was updated successfully, but these errors were encountered: