-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Daskification and Numba #153
Comments
Hi @pgierz Initially it was also my idea to use xarray everywhere, but it's my experience with OLD data, that restrain me from doing so at that time, and this is why I tried to keep possibility for functions to use both pure numpy arrays and dask. Probably some of my problems with early versions of xarray and dask are solved by now, and it's not a problem to use xarray in most of the places. Let's see. At the end of the day whether we decide to completely switch to xarray representation or not, the most important for me is that pyfesom2 continue to be collection of functions, that are easy to extract and modify. This is how the science done - you invent new things, so it should be easy for the user to actually take the function and extract it completely from pyfesom. Now it's admittedly not that easy already, since we rely in many cases on the |
@koldunovn Rather than rewriting everything, we could maintain a backwards compatible version in one of two ways: A) make "sister functions" with slightly modified names (or a slightly renamed module) B) Check for some environmental variable or some global setting and then either do numpy or dask with an if/else check. What do you think? |
The best way to make a decision is to play around with different variants. I still have to become more comfortable with what @suvarchal did. Hope to spend good amount of time with pyfesom2 next week. |
I started a bit with the "global option" idea, modeling it a little bit after what the HoloViz people do. In principle, the user will do the following: import pyfesom2 as pf
pf.dask_extension() Or, for tools from the command line: $ export PYFESOM2_USE_DASK="true"
$ pfinterp ..... If we incorporate this check directly into the function body, this would, unfortunately, make the code less "portable" in the sense that it clutters things and makes it a bit more difficult for users to just copy out a section they might want to incorporate into their own scripts without all the other pyfesom stuff around it. This is where the first idea would come in: we can write "sister functions", and then, depending on the current value, call either the normal version or the dask-enhanced version. That would look like this: def pfinterp(*args):
if pf.config.use_dask:
return _pfinterp_dask(*args)
# ...regular body of pfinterp... Now, before I start tearing the code apart and putting if/then everywhere, it would be good first to identify which functions or routines would benefit most from this. Also, would this be a maintainable approach, or rather would it be confusing? |
@pgierz The sister functions looks like a good way forward to me. I would also heavily use the xarray accessor functions introduced by @suvarchal for the xarray type data. So at the end we would have pure numpy and xarray with FESOM flavour as possible inputs for the functions. I hoped to start to spend this week mostly with pyfesom2 and fdiag, I desperately failed. I will try to do it next week, so much things need fixing, including currently terrible installation experience for the users... |
@pgierz thanks for the issue, among others it is really a favor to flush my head to consolidate and articulate my evolved views on it. Few more points in favor of proposal, and probably more towards Xarrayfication first.
more clutter still left in my head, eager to tell more on thoughts on how to do it without breaking existing functionality and incrementally (that i will follow-up on then in next comments) but would be curious to know on if it makes sense, comments and if you are convinced of worth a try. |
Hello @suvarchal, Thanks for the comments, they mirror a lot of what I have been thinking. I'll be working on exascale analysis tools full time as of next week, and I thought I'd start with FESOM. Perhaps it would be good to plan a bit and split up the work, otherwise things will end up being done twice. I had a look yesterday at the accessor class, what I would missing there was a section in the module docstring with some examples. Maybe you could do that? In the meantime, I can have a look at simply wrapping the load functions with xarray where I can, and I'll keep a list (in a separate issue) of which ones work out of the box, and where we have problematic cases. As for plotting: I would say that is a second step. I have some ideas: we can maybe leverage the hvplot extension, but let's get everything in xarray and dask first. |
Ok, sinse @pgierz you have some time and steam to do this, let's try to make it. From my side I suggest the following: https://github.com/FESOM/pyfesom2/projects/4 Those are mostly my tasks, but I would like to suggestions from your side on datasets, and maybe some best ways to do things. I suggest @pgierz and @suvarchal you create a daskification project and try sum up what would be worth trying first. From my point of view it is important for us to conceptually decide how we handle the mesh - should it be part of the xarray data set, as Suvi implemented or a separate object (that will make it easy to pass to functions, and allow to have pure data as an input). I would also think on how to read the mesh. Do we still need to rely on ASCII files, or just switch to netCDF file of |
@koldunovn to not clutter this issue, I would open a new one to discuss what exactly what to do with the mesh. I would prefer to have this one only be linked to dask things not solved by xarray @suvarchal, let's keep this one open for discussion. I'll go through tomorrow morning and make a project as Nikolay suggested, and once we have the "grunt work" done, we can use this place to discuss any particular edge cases. |
Hello, please see and add tasks to https://github.com/FESOM/pyfesom2/projects/5 |
Just to reemphasise on this point - we plan to change the order of dimensions in the future versions of FESOM2 (will be |
I would just have 2 different functions. Unifications is great, but I prefer to get some expected behaviour from the function instead of realising after an hour of debugging, that I forgot to switch backends. But type annotations is really a good thing, so I would try to use it for future code/refactoring. |
So the idea would be first write the functions, that have xarray data (and mesh in one way or another) as input, and then assemble (some of) them in the accessor? |
To follow up with Nikolay's last comment: I would need an example of this to fully understand. I have read online about the accessors, but still haven't fully grasped how this would work on either the programmer end nor the user end. If we want to make a spontaneous meeting, I can turn on my webex room after lunch, but let me know first. |
There are 2 example notebooks that use accessors: https://github.com/FESOM/pyfesom2/blob/master/notebooks/accessor_selections.ipynb I am fully booked until 16:00 |
Can we push meeting couple of days later, like wednesday or later, occupied a bit with handling sick kid meanwhile. |
Yes, was just an idea to get us all on the same page. Thursday afternoon or anytime Friday works for me |
not sure i get this, but do you mean..... there is no way around not using dimension based operations? that would fit my understanding. |
Both these times right now work for me. |
Well, you still can do it without, but it will require much more checks :) |
one more relevant one, still in a branch ( wasn't completely happy so didn't make a full fledged PR, I should do it soon 😄 ) |
Now I am not convinced we mean same things 😄 , i guess i was confused by double negative in the sentence 😄 |
A few thoughts on strategy to approach Daskification, all subject to discussion.
Will elaborate more in subsequent edits of this comments, right now a place holder :) |
Hello,
This is more for planning and discussion: some of the first things I'd like to do in the exascale analytics project is to prepare PyFesom for "obscenely large data" (henceforth and forever to be abbreviated as OLD, since we are all getting old)
Convert all Numpy things into Dask Arrays
Where possible, apply Numba
@jit
decorators.Try to convert as much as possible into Xarray.
Thoughts and suggestions for other changes are welcome 😃
The text was updated successfully, but these errors were encountered: