-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Resolves memory leak caused by using CRAFT detector with detect() or readtext(). #1278
base: master
Are you sure you want to change the base?
Conversation
I should clarify, this resolves GPU vRAM memory leaks. It's not resolving the CPU RAM memory leaks. |
Corrected to only call empty_cache() if the device in use is cuda. |
The |
@jonashaag did you attempt to replicate my results? It'll take you less than 15 minutes to give it a whirl and prove if it's possible or not. Because it did work for me, and the pytorch.org blog post I linked provides the reasoning for exactly why it does work. I'll quote here:
I'm not going to claim that I think it SHOULD work this way. But this isn't the first time some weird garbage collection and scoping issues across CPU/GPUs caused issues. Again, try it and let us all know if it's actually working for you or not. |
Sorry, maybe I misunderstood the reason why |
I don't think I understand it well enough to explain it better. I also call I'm far from an expert, but I do know that these changes resulted in halting the memory leaks I had, and I haven't had a CUDA OOM error since. Best suggestion is that since action produces information, you give it a whirl and let us know if it works. If it doesn't work for you, then that's valuable for me to know how your machine is different than mine, so I can make further changes to avoid getting these errors again if I scale-up or swap machines. |
@jonashaag Hey, I'd love to know if |
Sorry, I've switched to another engine (macOS Live Text) because it's better and much faster. I feel a bit bad to have left such a smart-ass comment initially and not contribute anything of substance here :-/ |
It's all good. Are you using Live Text natively on the devices or can it be hosted in a way that allows it to replace EasyOCR for serving a website that's not on an Apple device? |
Yes we run a Mac mini in production (via Scaleway) If you are interested I can share some code |
Thanks! I was able to reproduce and your fix works, it took me a while to figure out this issue, can we merge this PR asap and bump the version of EasyOCR (for now I just applied a local fix)? |
@@ -45,6 +45,9 @@ def test_net(canvas_size, mag_ratio, net, image, text_threshold, link_threshold, | |||
with torch.no_grad(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with torch.no_grad(): | |
with torch.inference_mode(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems useful but perhaps should be a separate PR? I was focused on the GPU memory leak with this PR so not sure if they should be packaged together.
Any news regarding the CRAFT-related leak ? I am noticing leaks either ways, when using gpu or not. Running on macos 14.5 here, at every run the memory usage increases by 10 to 100 megs, which is crazy.. how do people run easyocr in production ? |
This resolves the GPU memory leak. I didn't test if it resolves the CPU memory leak, so I don't know conclusively if it fixes the CPU memory leak. You could try it and let us know if it fixes the CPU memory leak. |
This fix enables garbage collection to appropriately work when
EasyOCR/easyocr/detection.py
Line 24 in c999505
See https://pytorch.org/blog/understanding-gpu-memory-2/#why-doesnt-automatic-garbage-collection-work for more detail.
Running
torch.cuda.empty_cache()
intest_net()
before returning allows nvidia-smi to be accurate.Interestingly, nvidia-smi showed that GPU memory usage per process was 204MiB upon reader initialization, and then would increase to 234MiB or 288MiB after running
easyocr.reader.detect()
, but then not increase beyond that point and in some cases reduce back down to 234MiB. I think this has something to do withOne note is that I tested this on a single GPU machine where I changed
EasyOCR/easyocr/detection.py
Line 86 in c999505
net = net.to(device)
, removing DataParallel. There's no reason this shouldn't work on multi-GPU machines, but noting it wasn't tested on one.I also only tested this on the CRAFT detector, not DBNet.
Relevant package versions
easyocr version 1.7.1
torch version 2.2.1+cu121
torchvision 0.17.1+cu121
Hope this helps!