-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory overflow for large instance #6
Comments
Hi @yetinam, Thanks for letting me know about this issue. Let me look into this problem and see what I can do; it's been a while since I've studied some of these lower level functions that may be causing the bottleneck. Daniel |
Ok, I took a look and indeed, those two dictionaries are not being cleared. If this was the issue, it should now be fixed with the latest update to the code. |
Thanks! I'll update my version and give it a try next week. |
I got around to testing the new version, but I'm still running into memory trouble. I got the code (in the old version) to run successfully with 750 GB memory, but the new version still crashes on 375 GB. I don't really have a good setup to measure the exact memory consumption though, so I don't know if the fix still improved the result. |
Thanks for checking. I'm not sure where the bottleneck is at this point, but I'll keep looking for areas of improvement. |
Hi @dttrugman ,
I'm struggling with the memory consumption of GrowClust3D for a large instance. My input contains about 350,000 events and 21 million differential travel times. I've tried running on a machine with 375GB, but unfortunately the computation runs out of memory. In this process, the memory consumption is growing slowly over time, i.e., the code only crashed after 2.6 million pairs.
From a theoretical standpoint, I'm not sure why the memory consumption of GrowClust3D should grow over time, so I had a look into the code. My suspicion is that the memory explosion comes from the dictonary
cid2pairD1
, mapping clusters to the indices of all pairs originating from this cluster. In each iteration, the cluster keys of the two clusters are merged and stored in one of the arrays. However, the entry for the now merged and thereby defunct cluster is not cleared. I assume that this way over time the dictionary will get very large because indices to many pairs will become present in many clusters. From how I understand it, the entry for a cluster in the array could be cleared after the merging because it will never be requested again. However, I don't understand the code well enough (and don't really know how to use julia), so I wanted to ask for your opinion on the memory issue and the potential cause.The text was updated successfully, but these errors were encountered: