Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory management problem #113

Closed
rwest opened this issue Apr 20, 2013 · 3 comments
Closed

Memory management problem #113

rwest opened this issue Apr 20, 2013 · 3 comments
Labels
abandoned abandoned issue/PR as determined by actions bot stale stale issue/PR as determined by actions bot Status: Stale This issue is out-of-date and may no longer be applicable

Comments

@rwest
Copy link
Member

rwest commented Apr 20, 2013

I am seeing strange behaviour when running RMG-Py on a computer with huge amounts of shared memory. The execution stats in the log file and statistics.xls claim that after ~2 hours it had used only 912 MB of memory, when the queuing system killed the job because it had exceeded its limit of 32,000 MB.

My guess is that the memory has been deallocated by Python, but not returned to the operating system. See http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm for example. I think on this computer Python thinks there is plenty of spare memory (the computer has 4,194,304 MB) so to save time doesn't bother returning its free memory to the operating system, but then the queue manager kills it when it exceeds 32,000 MB.

Any ideas how to confirm if this is the case, and if so, fix it?

Edit: see also:

@rwest
Copy link
Member Author

rwest commented Apr 21, 2013

I spent some time reading, experimenting with garbage collecting, and adding more memory monitors, using the PBS batch queuing system to gather properties. Another job just ran for a while, the queue info (collected at the end of each cycle) shows that although the overall job's memory use is always a bit above the Python one reported in the log file, and increases in jumps, it is still only around 1.3 GB when the job is killed for exceeding 32GB!.
I then checked the end of the stdout, and noticed the last message is this:

Warning: Increasing number of grains, decreasing grain size and trying again.
Using 104857600 grains from 338.57 to 1085.00 kJ/mol in steps of 0.00 kJ/mol to compute the k(T,P) values at 292.578 K

I'm guessing it's making the arrays for 100 million grains that fills the 32 GB of memory!
So, this is related to issue #77.

@rwest
Copy link
Member Author

rwest commented May 8, 2013

Just ran this job:
memory-with-time

It was killed during the next step for exceeding 32GB.
Looking at the log there's nothing obvious in the last step - no hundred million grains, like the comment above - so I'm not sure what suddenly took 20GB in one step.

That step contained 6 PDep calculations.
I ran them all through CanTherm independently, tracking its memory usage. They used between 101 and 413 MB of memory each. Not 20GB...

What next? stick memory use logging statements throughout the code to see when it leaps up?

@pierrelb pierrelb added the Status: Stale This issue is out-of-date and may no longer be applicable label Aug 28, 2015
@github-actions
Copy link

This issue is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant issue, otherwise it will automatically be closed in 30 days.

@github-actions github-actions bot added the stale stale issue/PR as determined by actions bot label Jun 22, 2023
@github-actions github-actions bot added the abandoned abandoned issue/PR as determined by actions bot label Jul 22, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abandoned abandoned issue/PR as determined by actions bot stale stale issue/PR as determined by actions bot Status: Stale This issue is out-of-date and may no longer be applicable
Projects
None yet
Development

No branches or pull requests

2 participants