CUDA out of memory error: while allocating the memory #16

jagadeesh09 · 2018-03-15T07:03:47Z

Hi

I am working on Tesla K40, 12 GB GPU machine. I am facing this error constantly. If I calculate the required memory for VGG model with respect to the mentioned batch size in dataset.py , the required memory is far less than the available memory of GPU. What could be the reason and how to overcome this?
I am facing this after initializing the model and while calling cuda() also.

THCudaCheck FAIL file=/pytorch/torch/lib/THC/THCCachingHostAllocator.cpp line=258 error=2 : out of memory
Traceback (most recent call last):
File "finetune.py", line 272, in
fine_tuner.train(epoches = 20)
File "finetune.py", line 163, in train
self.train_epoch(optimizer)
File "finetune.py", line 182, in train_epoch
for batch, label in self.train_data_loader:
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 281, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 301, in process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 81, in worker_manager_loop
batch = pin_memory_batch(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 148, in pin_memory_batch
return [pin_memory_batch(sample) for sample in batch]
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 142, in pin_memory_batch
return batch.pin_memory()
File "/usr/local/lib/python2.7/dist-packages/torch/tensor.py", line 92, in pin_memory
return type(self)().set(storage.pin_memory()).view_as(self)
File "/usr/local/lib/python2.7/dist-packages/torch/storage.py", line 87, in pin_memory
return type(self)(self.size(), allocator=allocator).copy(self)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/THCCachingHostAllocator.cpp:258

guangzhili · 2018-03-22T07:43:31Z

@jagadeesh09 Model parameters are not the only one that occupies the GPU memory, reduce batch_size to 16 or smaller would help.

buttercutter · 2018-10-10T00:33:21Z

@guangzhili

When I change the two lines of batch_size value in dataset.py from 32 to 16 , I have the following error. Why ?

[phung@archlinux pytorch-pruning]$ python finetune.py --train
/usr/lib/python3.7/site-packages/torchvision/transforms/transforms.py:187: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  warnings.warn("The use of the transforms.Scale transform is deprecated, " +
/usr/lib/python3.7/site-packages/torchvision/transforms/transforms.py:562: UserWarning: The use of the transforms.RandomSizedCrop transform is deprecated, please use transforms.RandomResizedCrop instead.
  warnings.warn("The use of the transforms.RandomSizedCrop transform is deprecated, " +
Epoch:  0
Accuracy:  0.3398
Epoch:  1
Accuracy:  0.8265
Epoch:  2
Accuracy:  0.6071
Epoch:  3
Accuracy:  0.63
Epoch:  4
Accuracy:  0.5951
Epoch:  5
Accuracy:  0.5837
Epoch:  6
Accuracy:  0.5537
Epoch:  7
Accuracy:  0.5672
Epoch:  8
Accuracy:  0.506
Epoch:  9
Accuracy:  0.5962
Epoch:  10
Accuracy:  0.6039
Epoch:  11
Accuracy:  0.5436
Epoch:  12
Accuracy:  0.6215
Epoch:  13
Accuracy:  0.5622
Epoch:  14
Accuracy:  0.5872
Epoch:  15
Accuracy:  0.5969
Epoch:  16
Accuracy:  0.5741
Epoch:  17
Accuracy:  0.5725
Epoch:  18
Accuracy:  0.6213
Epoch:  19
Accuracy:  0.6483
Finished fine tuning.
[phung@archlinux pytorch-pruning]$ python finetune.py --prune
/usr/lib/python3.7/site-packages/torchvision/transforms/transforms.py:187: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  warnings.warn("The use of the transforms.Scale transform is deprecated, " +
/usr/lib/python3.7/site-packages/torchvision/transforms/transforms.py:562: UserWarning: The use of the transforms.RandomSizedCrop transform is deprecated, please use transforms.RandomResizedCrop instead.
  warnings.warn("The use of the transforms.RandomSizedCrop transform is deprecated, " +
Accuracy:  0.6483
Number of prunning iterations to reduce 67% filters 5
Ranking filters.. 
Traceback (most recent call last):
  File "finetune.py", line 270, in <module>
    fine_tuner.prune()
  File "finetune.py", line 217, in prune
    prune_targets = self.get_candidates_to_prune(num_filters_to_prune_per_iteration)
  File "finetune.py", line 186, in get_candidates_to_prune
    self.prunner.normalize_ranks_per_layer()
  File "finetune.py", line 101, in normalize_ranks_per_layer
    v = v / np.sqrt(torch.sum(v * v))
  File "/usr/lib/python3.7/site-packages/torch/tensor.py", line 432, in __array__
    return self.numpy()
TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
[phung@archlinux pytorch-pruning]$

buttercutter · 2018-10-15T16:28:12Z

I have already installed the latest pytorch since it had solved this tensor.cpu() problem

I am using https://www.archlinux.org/packages/community/x86_64/python-pytorch-cuda/

So, what is actually still triggering this tensor.cpu() issue ?

nguyenbh1507 · 2019-10-11T08:38:26Z

v = v / np.sqrt(torch.sum(v * v))

replace np.sqrt(torch.sum(v * v)) by v.norm

It worked for me. I think that np.sqrt() requires a variable on cpu, not gpu

nguyenbh1507 · 2019-10-11T08:41:30Z

I observed that the out-of-memory still occurs even I change batch size to 16. The first round was OK, but the second wasn't. I think we should delete the previous redundant unused model on GPU to free up memory before allocating the new one.

ChaoLi977 · 2020-02-14T04:59:52Z

I met a similiar issue, and solved it by setting pin_memory=false.
https://discuss.pytorch.org/t/using-pined-memory-causes-out-of-memory-error-even-though-batch-size-is-set-to-low-values/30602

akbarali2019 · 2022-04-07T21:16:36Z

I met a similiar issue, and solved it by setting pin_memory=false. https://discuss.pytorch.org/t/using-pined-memory-causes-out-of-memory-error-even-though-batch-size-is-set-to-low-values/30602

Could you clarify the path for the pin_memory? How can I change it into false?

zhmiao mentioned this issue Jun 10, 2024

Memory problems when batch classifying large directories microsoft/CameraTraps#490

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory error: while allocating the memory #16

CUDA out of memory error: while allocating the memory #16

jagadeesh09 commented Mar 15, 2018

guangzhili commented Mar 22, 2018

buttercutter commented Oct 10, 2018

buttercutter commented Oct 15, 2018

nguyenbh1507 commented Oct 11, 2019

nguyenbh1507 commented Oct 11, 2019

ChaoLi977 commented Feb 14, 2020

akbarali2019 commented Apr 7, 2022

CUDA out of memory error: while allocating the memory #16

CUDA out of memory error: while allocating the memory #16

Comments

jagadeesh09 commented Mar 15, 2018

guangzhili commented Mar 22, 2018

buttercutter commented Oct 10, 2018

buttercutter commented Oct 15, 2018

nguyenbh1507 commented Oct 11, 2019

nguyenbh1507 commented Oct 11, 2019

ChaoLi977 commented Feb 14, 2020

akbarali2019 commented Apr 7, 2022