Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for sum of models and custom models #160

Merged
merged 31 commits into from
Oct 13, 2023

Conversation

rhugonnet
Copy link
Contributor

@rhugonnet rhugonnet commented Aug 11, 2023

Early draft! Stopped here to discuss the way forward, I prefer having your point of view before adjusting too many things in Variogram! 🙂

For now:

  • Added another test to check custom model definition (there were some already): test passed immediately,
  • Added a test to check custom model fitting: test didn't pass, required the adjustment below,
  • Added a boolean class attribute Variogram._is_model_custom defined during set_model(). This allows to separate the definition of bounds and p0 in Variogram.fit(), defined based the length of args from inspect.

However, doing it the current way, bounds and p0 will be quite poorly defined (at this point the script cannot tell which parameter matches which dimension). The user would likely have to pass them manually to reach a good fitting performance.
Additionally, the current implementation will raise issues in Variogram.describe() which then passes on to Variogram.parameters.

One way we could move forward would be to divide in 2 cases:

  1. Add specific support for models that are a combination of models existing in skgstat/models, which would allow ._get_bounds(), .describe(), .parameters to function normally and make it easy for the user;
  2. Add basic support for custom models, with a warning to the user that he needs to pass his own bounds and p0 to reach good fitting performance (in case those aren't passed).

To implement 1., however, I see several questions:

  • How to conveniently pass the combination of existing models to the model argument? I'm not sure of the best option. A dictionary such as model={"op": np.sum, "models": ["spherical", "gaussian", "matern"]} could do the job? But maybe this will get overly complex. Thinking about it, for something more complex than a sum, I'm not sure we'll be able to benefit much from knowing the nature of the models to infer "bounds" and "p0" efficiently anyways. We could simply leave it to case number 2, and add support just the sum for convenience, for which we can easily pass a list ["spherical", "gaussian", "matern"]?
  • How to return .parameters? Right now it's a list, should we simply concatenate into a longer one keeping always the same parameter order? Or switch to a dictionary?
  • For describe(), I guess we can simply detect that it is a combination of models and return a more detailed list of parameters.

To implement 2.:

  • We have to allow bounds and p0 to be passed by the user. I guess this should be both in Variogram.fit() and __init__()? We could add fit_bounds and fit_p0 in Variogram.__init__. But there's already a lot of arguments, so could also be in kwargs. What do you think?

That's all I noticed for now, but there might be a few additional things down the line! 😅

Resolves #159

@codecov
Copy link

codecov bot commented Aug 11, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (c2d63d1) 89.75% compared to head (9690d39) 90.74%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #160      +/-   ##
==========================================
+ Coverage   89.75%   90.74%   +0.99%     
==========================================
  Files          23       23              
  Lines        2274     2475     +201     
==========================================
+ Hits         2041     2246     +205     
+ Misses        233      229       -4     
Files Coverage Δ
skgstat/DirectionalVariogram.py 90.32% <100.00%> (+0.52%) ⬆️
skgstat/Variogram.py 96.36% <98.30%> (+0.73%) ⬆️

... and 7 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mmaelicke
Copy link
Owner

Hi @rhugonnet,

Nice, thanks for all the effort! I am currently on vacation, but will be back next Tuesday. Is that early enough for you, if I only add a detailed response then?

A few quick thoughts: passing p0 and bounds might be a viable extension for the user in any case. As you can also manually fit a variogram model using fit_range on init, or range on fit, I think we can add these parameters like that. I would use the kwargs on init for that.

Maybe an option to define a sum-combination of pre-defined functions would be to add something like: model='spherical+gaussian'. That would be convenient and does not break the current semantics, as the parameter accepts strings for pre-defined models anyway. It would need to internal refactoring, as the Variogram.model property returns a function wrapped with models.variogram, ie. to set the right __name__ attribute on the function object (to be used in describe() and plotting). We would need some wrapper here, that can construct the actual model, and fill the __name__ property with something meaningful.

I would like to think about the parameters for a moment longer, and give you some thoughts on how to handle the parameters then. I think it would be problematic to change the structure of Variogram.parameters as a number of interfaces ie. to gstools.CovModel or gstatsim.Interpolate rely on that list. But I am sure, we can come up with something useful here. Have to think about that after my vacation.... :)

Thanks again and you will hear from me soon,
Mirko

@rhugonnet
Copy link
Contributor Author

Sounds good, I like all ideas!

No problem, priority to holiday and I'm busy on other things next week anyway, will continue the week after next 🙂
Enjoy your vacation @mmaelicke! 😉

Copy link
Owner

@mmaelicke mmaelicke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for the contribution. I have only one comment regarding the docstring, but will approve anyway.

skgstat/Variogram.py Show resolved Hide resolved
@rhugonnet rhugonnet marked this pull request as draft August 21, 2023 04:47
@rhugonnet
Copy link
Contributor Author

Sorry, was still working on your comments!
Just finalized the logic to support model names such as "spherical+gaussian" across the Variogram methods that used _model.__name__. Now I need to add more tests + doc, for now all existing tests pass. I'll do that tomorrow!

@rhugonnet rhugonnet changed the title Add support for custom model fitting Add support for sum of models and custom model Aug 22, 2023
@rhugonnet
Copy link
Contributor Author

Alright, I think this PR is now at a good stage! 😄

In short: Variogram now fully supports model sums such as "spherical+gaussian", as well as custom models from callable!

To make this work for custom models: You've already seen 80% of the changes in the first draft, I just had to add a couple other exceptions in describe() and parameters, now it all works!

To make this work for sums: I introduced a _model_name class attribute, and added two non-public functions: _get_argpos_sum_models to get argument positions and _build_sum_models to create the summed model. Those are then used by set_model(), describe(), fit() and parameters to properly set/get the arguments of each model in the sum.
By default, describe() returns a dict with effective_range_1, sill_1, nugget_1, effective_range_2, and so on... to make it "human-readable". And parameters returns a list of those in order given by the model name ("spherical+gaussian" = spherical first, gaussian second).

Wrote a lot of tests, and added descriptions in the function docstrings!

Points left to finalize:

@rhugonnet rhugonnet marked this pull request as ready for review August 22, 2023 05:55
@mmaelicke
Copy link
Owner

Hi,

I really like your changes! I think it would be helpful if you could add an example, or maybe even a little tutorial notebook to the docs for others.

Concerning all the other points: I will think them through and give you my opinion on them.
Py 3.9 and 3.10 failing is happening for all branches right now, I will investigate what is going on there. Maybe that is related to bugfixes in sicpy, which might change the parameters estimated by curve_fit. Maybe we should run unittests way more fuzzy, as curve_fit is not really SciKit-GStat's business. Not sure yet.

I'll come back to this soon

@rhugonnet
Copy link
Contributor Author

Thanks for the feedback, will start on the tutorial and wait for the other points!

@rhugonnet rhugonnet changed the title Add support for sum of models and custom model Add support for sum of models and custom models Aug 24, 2023
@rhugonnet rhugonnet requested a review from mmaelicke September 21, 2023 21:59
@mmaelicke
Copy link
Owner

mmaelicke commented Sep 22, 2023

I will add my review as soon as possible.

I'm thinking I could also add a tutorial/technical note on fitting multiple range using both RasterMetricSpace + sum of models. But maybe in a different PR where we also add RasterMetricSpace to the API?

That sounds great! And yes, I would also do that in a separate PR.

Copy link
Owner

@mmaelicke mmaelicke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the changes, fixed a lot of stuff that was on the agenda for ages, along with the sum of models.
Do you have any idea, why the python 3.9 unittests are failing? is this still connected to the bug in curve_fit?

@rhugonnet
Copy link
Contributor Author

Just did some tests in this PR changing the requirements.txt, and it looks like the tests that fail for the stable_entropy binning come from updates in NumPy 1.25.
Additionally, it looks like the fix in SciPy 1.11.3 did not fix the issue for some of our cases... We still need 1.11.1 to pass. They're discussing some additional issues introduced in 1.11.2 here: scipy/scipy#19309.
I'll investigate and report a reproducible example of our issue in SciPy if I can get one!

@rhugonnet
Copy link
Contributor Author

Otherwise all passing with NumPy 1.24 and SciPy 1.11.1! 😅

@rhugonnet
Copy link
Contributor Author

For the SciPy issue persisting with 1.11.3, I maanged to make a reproducible example and opened an issue on SciPy directly, see: #161 (comment)

Investigating the NumPy one...

@rhugonnet
Copy link
Contributor Author

It looks like the NumPy issue started with 1.25, but I couldn't reproduce it locally or track down why exactly...
Maybe we could simply reduce the floating precision of the stable_entropy bin estimation to 0 decimals? It's not very far off

@mmaelicke
Copy link
Owner

It looks like the NumPy issue started with 1.25, but I couldn't reproduce it locally or track down why exactly... Maybe we could simply reduce the floating precision of the stable_entropy bin estimation to 0 decimals? It's not very far off

I was also not successful in finding the exact problem. I guess we go for less precision, but open an issue to remind ourselves to test again with newer numpy versions?

@rhugonnet
Copy link
Contributor Author

Yes, I agree!
I removed the NumPy version pinning, opened #166 and changed the test precision to 0 decimals.

For SciPy: they are going to revert the changes to how it was in 1.11.1 and before, and release a 1.11.4, but it might take some time. In the meantime, should we merge this PR with the version pinning in requirements.txt to <=1.11.1 and open an issue to unpin once they make the release? It would allow to test the other opened PRs without having this issue in the CI!

@mmaelicke
Copy link
Owner

I agree. Thanks for all the input, I am really glad that you are aware of all the changes happening in scipy.

@mmaelicke
Copy link
Owner

Hey @rhugonnet,

The pre-commit is now merged into main with #167. I just resolved the merge conflicts. There are some minor fixes, pre-commit complains about. I can't push on your fork, so could you fix the pre-commit stuff? Then we can finally merge and have a working repo again.

Thanks for all your effort

@rhugonnet
Copy link
Contributor Author

@mmaelicke Perfect, thanks! All done, we can merge 🙂
I'll open an issue to remind ourselves to unpin SciPy when their bug is fixed.

If you want to work directly on the PRs I opened, I think normally you can push on someone's fork (maybe depends on repo params? but usually works for me):

git remote add rhugonnet https://github.com/rhugonnet/scikit-gstat.git
git fetch rhugonnet --prune
git checkout rhugonnet/combine_models
# Will go in "detached HEAD" mode
# Make the changes and commit them
git push rhugonnet HEAD:combine_models

@mmaelicke
Copy link
Owner

If you want to work directly on the PRs I opened, I think normally you can push on someone's fork (maybe depends on repo params? but usually works for me):

Ah cool, I was unaware of that. Nice to see all the checks green again. I'll merge and go home :)

@mmaelicke mmaelicke merged commit 1c83f5a into mmaelicke:main Oct 13, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fit of a sum of variogram models
2 participants