-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MTIE calculation crashes trying to create too big array #82
Comments
Thanks for the bug report and the patch. I wondered why we use such a memory-intensive solution, in the first place, so I inquired a little: mtie_rolling_window docstring reads
https://mail.python.org/pipermail/numpy-discussion/2011-January/054364.html seems more related ("Rolling window (moving average, moving std, and more)" ), maybe numbering change between mail archives... But the whole point of the discussion is precisely to enhance performance while saving memory ! And it mentions cases where it does allocate large chunks of memory (https://mail.python.org/pipermail/numpy-discussion/2011-January/054370.html) So I see several ways forward:
Before I dig any further, may be somebody remembers why this particular design has been chosen ? [edit: better understanding of the 2011 discussion...] |
My estimation is ~5 GB for this array (436365 x 1515 x 8 bytes). Even if it wasn't double precision number (don't remember very well) it is ~2.5 GB for single precision number. I'm using 32-bit Python so it is good enough to reach memory limit. But usually Python throws Memory Error for such a cases so my first impression was that there is a size limit for arrays in numpy. Performance of the inserted code is not very good for small tau (small window size). It is slower than original code. But it performs well for the large windows. So in connection with the original code overall performance is good. |
thanks for reporting this and suggesting a fix! maybe an automated test would be useful also - where we can see the largest time-series that can be computed - both before and after applying this patch. I can try to merge the code above during the weekend. |
I have applied the fix (above), but now I seem to be unable to reproduce the large memory consumption issue with the old code. @AntonPrygrodskyi can you tell us what numpy version (and OS/python you are using). Without a test that actually triggers the new code to run it is left without test-coverage which isn't great. |
What data size did you use as a test ? I would bet you need something on par with your RAM + swap, which is huge on any recent machine... @AntonPrygrodskyi hinted he was using 32bits python (running on Windows, seeing the paths), which might lower the limit at which the bug is triggered. Anyway, I found it far too lengthy to create and process an appropriately large dataset on 64bit OS with a fair amount of RAM (8GB): I write an infinite loop, gradually increasing the size of the sample, and gave up because of execution time (during which the machine becomes less and less responsive...), but maybe one could try on a VM with very small RAM ? Not practical for unit testing though... and definitely not a 1-second test. My suggestion would be to keep the patch (which fixes #4 by the way !...), and maybe add a warning-level log entry, something along the lines: "Specific processing due to the large size of the data set". |
@aewallin I'm using Python 3.6.5 32-bit; Windows 10 64-bit; 8 GB RAM. I'll which numpy version I'm using and will comment on this separately. If there is newer version I'll also check how it performs. As I mentioned my input dataset has length of 436365 elements. How long dataset(s) you've tried? |
@aewallin My numpy version is 1.14.5 which seem to be newer than youth. So it could be new bug rather than fixed. I'm also attaching data file I've used when faced that issue. Sample interval for this data is 33 ms. |
Type of exception deduced from aewallin#82
Hi
When calculating MTIE for a data array of 436365 elements length I've got an error saying:
Traceback (most recent call last):
File "C:/Work/Projects/TimeErrorSimulationGUI/SupportingFiles/tDEVCalcCheck.py", line 137, in
tauArr, mtieAllanTools, a, b = allantools.mtie(freqTrend, 1 / 0.033, 'freq', taus=tauArr)
File "C:\Work\Projects\TimeErrorSimulationGUI\GUI\allantools.py", line 1086, in mtie
rw = mtie_rolling_window(phase, int(mj + 1))
File "C:\Work\Projects\TimeErrorSimulationGUI\GUI\allantools.py", line 1053, in mtie_rolling_window
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
File "C:\Users\antonp\AppData\Local\Programs\Python\Python36-32\lib\site-packages\numpy\lib\stride_tricks.py", line 102, in as_strided
array = np.asarray(DummyArray(interface, base=x))
File "C:\Users\antonp\AppData\Local\Programs\Python\Python36-32\lib\site-packages\numpy\core\numeric.py", line 492, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: array is too big;
arr.size * arr.dtype.itemsize
is larger than the maximum possible size.Error happens when window size reaches 1515 elements and software tries to create 436365 x 1515 array. Used Python 3.6.5.
In order to overcome the issue I've added piece of code which gets in play when original code reaches size limit (see except part of try/except block below) and continues calculation:
`
The text was updated successfully, but these errors were encountered: