-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance enhancement Variogram in _calc_groups()
method (scikit-gstat #145)
#148
base: main
Are you sure you want to change the base?
Conversation
Hi there, thanks a lot for getting started! The error message is caused by Once, that part is finished, the error messages should be resolved. I hope this already helps, otherwise I can free some time tomorrow to give a more complete answer and/or commit to this branch, if needed. Best, |
and btw. the |
I think I can start with that. It will probably take some time to progress with this, as I will work on this in the spare minutes between other projects. |
Yeah, same for me. Just drop me a notice when you want me to contribute somehow |
Took me some time, but I am wondering if the small changes in the next commit would do the trick. As the |
I guess that is right. I have to defend my PhD thesis in two weeks, therefore I can't test or contribute right now. But there should be tests, that will fail if the changes yield different results. From mid July on, I can start testing and contributing myself. |
That's alright. I can see that that has priority. Good luck defending! |
Hey, I have a short update. While working on another project I had to do something really similar (with pytorch) and quickly checked the respective Numpy functions. Consider to replace the # vario is the skg.Variogram class
digi = np.digitize(vario.distance, vario.bins, right=True)
grp = np.where(digi == vario.n_lags, -1, digi) This will yield exactly the same internal assert (grp == vario._groups).all() This should get rid of the for loop. In terms of speed, I ran this short test: def digi(vario):
grp = np.digitize(vario.distance, vario.bins, right=True)
grp2 = np.where(grp == vario.n_lags, -1, grp)
for n in (30, 50, 100, 300, 500):
coords, vals = skg.data.pancake(N=n, seed=42).get('sample')
vario = skg.Variogram(coords, vals, maxlag=0.6, n_lags=25)
print(F"N = {n}")
print(f"{round(timeit.timeit(lambda: vario._calc_groups(force=True), number=100) * 10, 2)}ms")
print(f"{round(timeit.timeit(lambda: digi(vario), number=100) * 10, 2)}ms\n")
If you want, and you agree, you are welcome to to push these changes. Then I can put my code review here and the contribution would be associated with you after the merge (as it was your idea). Best |
Yes, that sounds good. This seems to be the easier solution. I already noticed that not labeling with |
This is the pytest output |
Ok, the fail is not associated to the changes made here. There is an optional dependency on About the performance drop compared to your tests: First, the If you are happy with the changes, I can go ahead and merge? |
Running this:
I get the following: Execution time is 111.75757568539993 seconds |
I made a start here. Some questions:
The
_group
names should be1
up tolen(bin_edges) - 1
, and where grouplen(bin_edges)
should be-1
, as this last group will be leftout of the variogram? If that is correct, I need to figure out how this can be done efficiently.I ran
pytest
but I got a lot of failures which say: "ValueError:ydata
must not be empty!". What does this mean?