-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues on google cloud (and beyond) #6
Comments
I have just tried my older method using import gsw
def _sigma0(lon, lat, z, temp, salt):
pr = gsw.conversions.p_from_z(-z, lat)
sa = gsw.SA_from_SP(salt, pr, lon, lat)
ct = gsw.CT_from_pt(sa, temp)
sigma0 = gsw.sigma0(sa, ct)
return sigma0
def reconstruct_sigma0(lon, lat, z, temp, salt):
kwargs = dict(dask="parallelized", output_dtypes=[salt.dtype])
ds_sigma0 = xr.apply_ufunc(_sigma0, lon, lat, z, temp, salt, **kwargs)
return ds_sigma0 For now that works (it takes long and has high memory use but the above example actually crashes eventually), but it is clunky because of the latitude dependence and I would like to use a more performant implementation in the future. Just wanted to add another datapoint. EDIT: I spoke too soon. This failed with some obscure broadcasting issue. Could it be that there is some new bug in xr.apply_ufuncs? |
Julius, can you try this with the latest master? In #5, @cspencerjones implemented the xarray wrapper layer, so you should not have to call apply_ufunc at all. I'm not sure this makes any difference, but I would like to see. |
I should have mentioned this earlier. This is installed from the latest master (as of today). I am having trouble reproducing this behavior, which makes me think that this might be another problem on the google side? |
Oh I think I misunderstood. You mean use it like this?
EDIT: I just tried it and it gives me rather obscure error:
|
One possible issue is, that salinity and temperature for this model are chunked differently in time 🙀. I rechunked them right after import and it seems to at least make the cluster die slower, but it is still not good. |
An update from my side. Thanks to @cspencerjones I was able to get this going. I tried 3 different approaches
3. It would be great if (2.) would work out of the box, but I think adding (3.) to the docs would be a great step forward already. This really makes me wonder if there is an underlying problem with |
I wonder if |
You are not supposed to import from either jmd95wrapper or jmd95numba. Just import jmd95. That will import from jmd95wrapper. This is our only public API: fastjmd95/fastjmd95/__init__.py Line 1 in d0cd78f
It looks like 1 and 3 do the same thing, no? fastjmd95/fastjmd95/jmd95wrapper.py Lines 19 to 20 in d0cd78f
|
We should also probably be using a more dynamic |
There is certainly a difference for my use case, but now I am even more confused to why? Is it the dtype? Let me check that on my inputs... EDIT: Both of my inputs are |
Ok that makes sense, but the information in the notebook needs to be updated (that was what I based my trial on). |
This is related to issues @stb2145 was having today. |
I am using fastjmd95 to infer potential density from CMIP6 models. I have recently experienced performance issues in a complicated workflow, but I think I can trace some of it back to the step involving fastjmd95.
Here is a small example that reproduces the issue:
I then performed some tests on the Goodle Cloud deployment (dask cluster with 5 workers)
When I trigger a computation on the variables that are simply read from storage (
so.mean().load()
, everything works fine, the memory load is low and the task stream dense)But when I try the same with the derived variable (
sigma_0.mean().load()
), things look really ugly: The memory fills up almost immediately and spilling to disk starts. From the Progress Pane it seems like dask is trying to load a large chunk of the dataset into memory before therho
calculation is applied.To me it seems like the scheduler is going
wide
on the task graph rather thandeep
, which could free up some memory?I am really not good enough to diagnose what is going on with dask, but any tips would be much appreciated.
The text was updated successfully, but these errors were encountered: