Radeon HD 7850 / Clover: GPU locks up during backpropagation #11
Replies: 5 comments
-
Hi, sorry just now I have noticed. The biggest problem with 7850 is its low GPU memory. what version 1 or 2GB? It is GCN1 so if winograd does not work general conv should work (also not as efficient) You can take a look to line 722 in src/core/conv.cpp and set
Is probably what you are going to get. It is ~1.5TFlops GPU class. I used gtc 960 that is 2.2TFlops with 4G run and it is only several times faster that i5-6600, You can't compare it to ~8TFlops 1080 with highly optimized algo. Also Mesa isn't best driver. If you can use AMDGPU-PRO or ROCm you'll get better result. I suggest take a run using Also try to run tests. I notices some hang-ups and AMD GPU Drivers when memory is overloaded. Make sure you don't use more than you have. |
Beta Was this translation helpful? Give feedback.
-
Make sure you use small batch size set -B4 or even -B2 you can't get large batches especially with resnet50 or Vgg16 that use lots of memory. I wouldn't try them in 2GB GPU. |
Beta Was this translation helpful? Give feedback.
-
Can you show what you did? |
Beta Was this translation helpful? Give feedback.
-
One thing I noticed is that when nan-values appear the atomic operations become very-very small (don't know why) So make sure you don't have nan-values |
Beta Was this translation helpful? Give feedback.
-
Probably this was an issues: 6a34a58 |
Beta Was this translation helpful? Give feedback.
-
Hi there!
I have an old Radeon HD 7850 GPU (using the standard debian bullseye mesa/Clover opencl driver), whose software options to accelerate deep learning tasks are unfortunately rather limited. The only one which I got to work was caffe-opencl, but the performance was not great (2-3x speedup compared to an old quadcore Athlon cpu, 100x slower than tensorflow/keras on gtx 1080/cuda). So I turned to dlprimitives, which I managed to compile, but there was an error in the winograd_3x3 kernel compilation (local memory limit exceeded). This I could fix by adjusting workgroup size / other params. Forward propagation is now successful for all sample networks. However, training is not possible due to the GPU locking up while processing the first batch when backprop is enabled.
I am not experienced with OpenCL, but I tried isolating the issue and it seems to have something to do with conv2d backpropagation. When using GEMM convolution for backprop, it always seems to hang, but only occasionally when using winograd / depthwise.
I know this is an old card that is probably not going to give spectacular performance even when optimally exploited, but it would still be cool to see it run using dlprimitives. All the best
Beta Was this translation helpful? Give feedback.
All reactions