Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory overflow for large instance #6

Open
yetinam opened this issue Sep 5, 2024 · 5 comments
Open

Memory overflow for large instance #6

yetinam opened this issue Sep 5, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@yetinam
Copy link

yetinam commented Sep 5, 2024

Hi @dttrugman ,

I'm struggling with the memory consumption of GrowClust3D for a large instance. My input contains about 350,000 events and 21 million differential travel times. I've tried running on a machine with 375GB, but unfortunately the computation runs out of memory. In this process, the memory consumption is growing slowly over time, i.e., the code only crashed after 2.6 million pairs.

From a theoretical standpoint, I'm not sure why the memory consumption of GrowClust3D should grow over time, so I had a look into the code. My suspicion is that the memory explosion comes from the dictonary cid2pairD1, mapping clusters to the indices of all pairs originating from this cluster. In each iteration, the cluster keys of the two clusters are merged and stored in one of the arrays. However, the entry for the now merged and thereby defunct cluster is not cleared. I assume that this way over time the dictionary will get very large because indices to many pairs will become present in many clusters. From how I understand it, the entry for a cluster in the array could be cleared after the merging because it will never be requested again. However, I don't understand the code well enough (and don't really know how to use julia), so I wanted to ask for your opinion on the memory issue and the potential cause.

@dttrugman
Copy link
Owner

Hi @yetinam,

Thanks for letting me know about this issue. Let me look into this problem and see what I can do; it's been a while since I've studied some of these lower level functions that may be causing the bottleneck.

Daniel

@dttrugman dttrugman added the enhancement New feature or request label Sep 6, 2024
@dttrugman
Copy link
Owner

Ok, I took a look and indeed, those two dictionaries are not being cleared. If this was the issue, it should now be fixed with the latest update to the code.

@yetinam
Copy link
Author

yetinam commented Sep 6, 2024

Thanks! I'll update my version and give it a try next week.

@yetinam
Copy link
Author

yetinam commented Sep 11, 2024

I got around to testing the new version, but I'm still running into memory trouble. I got the code (in the old version) to run successfully with 750 GB memory, but the new version still crashes on 375 GB. I don't really have a good setup to measure the exact memory consumption though, so I don't know if the fix still improved the result.

@dttrugman
Copy link
Owner

Thanks for checking. I'm not sure where the bottleneck is at this point, but I'll keep looking for areas of improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants