Spei, proper use of `loc` parameter in scipy dists #1653

coxipi · 2024-02-19T14:20:16Z

Pull Request Checklist:

This PR addresses an already opened issue (for bug fixes / features)
- This PR fixes #xyz
Tests for the changes have been added (for bug fixes / features)
- (If applicable) Documentation has been added / updated (for bug fixes / features)
CHANGES.rst has been updated (with summary of main changes)
- Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

Spei used to rely on an offset to ensure that distribution of values are positive when using distributions such as gamma, fisk. Now, instead, we properly use the loc parameter in those distributions that generalize the distributions and allow negative values. For instance, the gamma disitribution is a 2-parameter function, but in scipy it's generalized to 3 parameters, with a loc parameter. This extends the support of gamma from $x \in [0,\infty]$ to $x \in [\text{loc}, \infty]$.

The code was already compatible with this idea, just had to change _fit_start so it can compute estimates of the parameters in the cases with negatives values.

Does this PR introduce a breaking change?

Yes, removing the offset from SPEI.

Other information:

…_loc

CHANGES.rst

…_loc

coxipi · 2024-02-22T21:31:14Z

I'm pretty much done here I think, but I would like to test a bit more situations where we have negative values. I need to find realistic cases, not too extreme I suppose, although I don't see why the method would not work with very negative values. I have to see!

Shifting the values like:

pr = pr - f*pr.mean()

for $f \in [0,1]$, I find that the SPI values change more for gamma. For method ML, the error is somewhat constant, and for method APP, results get worse and worse as the shift is increased:

	Mean[(spi_shifted - spi)/spi]
gamma-ML	~7%
fisk-ML	~0.1%
gamma-APP	from 0.001% to 69%
fisk-APP	from 0.001% to 10%

I say error because I assume that the SPI should not change if we apply a constant change to all precipitations

coxipi · 2024-03-15T01:21:59Z

So I tried to obtain a proper estimation of loc parameter in the case of the gamma distribution, resorting to approximations as it's already the case when we have the two-parameter distribution.

It becomes a mess. Even with a significantly worse approximation, the expression is quite messy. It can be done, but maybe there is a simple way I'm not seeing.

loc is estimated in some of the distributions in _fit_start (e.g. GEV) so I will see if there's something obvious I'm not seeing. Otherwise, _fit_start will probably still work well to do what its name implies, start a fit, but otherwise should not be used as an approximate method (unless the intent IS to use a two parameter distribution). So, either improve _fit_start or limit its application as a method for SPI/SPEI.

coxipi · 2024-03-22T18:38:38Z

I will implement a very rudimentary estimation of the loc parameter using the quantiles:
Consider an ordered time series $X_1, X_2, \dots, X_n$. The rawest approximation would be: $\text{loc}^* = X_1$ which was the spirit of having an offset. By the definition of loc, we know we must have $\text{loc} \leq X_1$. A more refined approximation is: $\text{loc}^* = X_1 + (X_1-X_2)$, which is smaller. This might undershoot, i.e. $\text{loc}^* < \text{loc}^{\rm Truth}$. And so on.

If we want to use a specific loc parameter, this could be used through fitkwargs. For instance, in some cases, we might want to impose loc==0. We should probably restrict the access to APP method. I could see it be used when loc==0, but otherwise I would forbid this use.

In any cases, we should keep the zero inflated distributions for SPI/precips. I have a good idea of what would be good for many examples, I just need to figure out a proper way to organize things around these ideas.

I think adding a zero_inflated option to fit would be useful

…i_loc

coxipi · 2024-03-26T23:24:11Z

I had to change some results in test_indices. The most significant changes are for the fisk distribution since I changed significantly the approximations. We now have a more rigorous implementation of the method of moments (I used and <x^2>/^2.

The choice of starting parameters can significantly affect even the full-fledged numerical approach (e.g. "ML" method)

coxipi · 2024-04-15T22:43:33Z

I've been exploring the sensitivity to initial guesses in our procedure.

Consider what is done in test_indices.py (test_standardized_precipitation_index, etc), computing a few values of SPI/SPEI. I show the example with the fisk distribution, but I also saw this phenomena for gamma dist

freq = "D"
window = 1
dist = "fisk"
method = "ML"
ds = open_dataset("sdba/CanESM2_1950-2100.nc", cache=True, cache_dir=Path("/home/eridup1/tank/xclim_open_dataset/")).isel(location=1)
pr = ds.pr.sel(time=slice("1998", "2000"))
pr_cal = ds.pr.sel(time=slice("1950", "1980"))
params = xci.stats.standardized_index_fit_params(
    pr_cal, freq=freq, window=window, dist=dist, method=method, zero_inflated=True,
)
spi  = xci.standardized_precipitation_index(pr, params=params)
print(spi.isel(time=slice(-11, -1, 2)))

I used two ways to estimate loc in _fit_start, the spi results sometimes largely, sometimes barely

# loc=0
>> [-0.12736419  1.30823431  0.67651811  0.10446318 -0.50246943]
# loc = (x1 * xn - x2**2) / (x1 + xn - 2 * x2)
# x1 is the smallest precip in the dataset, x2 the second smallest, xn the largest
>> [-0.15894928  1.30822495  0.44984607  0.14669885 -0.50273746]

It makes me think in some cases, the same minimum is found, in other cases no.

The difficulty of optimization for 3-parameter gamma distribution is discussed in this issue, confirms my suspicion.
scipy/scipy#18775

They mention scipy.stats.fit may be more robust than scipy.stats.gamma.fit in the case of 3-params dist for instance

CHANGES.rst

missing 's' Co-authored-by: Vasco Schiavo <[email protected]>

coxipi · 2024-04-16T20:53:54Z

I did the computation for 20 values of loc0 between 0 and the guess I outlined above. There is some variability in the spi output, and that's true for any day of year:

doy=361 (the 0.104 vs. 0.147 spi values above):

another random doy I took, 180

The worst doy mentioned above, the one with 0.68 vs. 0.45

BONUS: using `scipy.stats.fit` instead of `scipy.stats.fisk.fit`

All in all, it does not fare much better. Also, in this case, you have to input reasonable range of values to search within, so maybe this part could be optimized a bit more to avoid large span of values like this.

All in all, I think things are controlled well enough. As a last study, I will consider how the R package performs in these circumstances.

coxipi · 2024-04-18T18:09:44Z

A bit of a tangent, but I was wondering how does this compare to the error in a single fit. I generate 30 points in a gamma dist with R, computed a fit + std errors, then computed a SPI for a hundred set of parameters within the error bars, this shows the inherent variability with this measure.

coxipi · 2024-04-18T19:35:14Z

I've been battling with R trying to obtain a confirmation through another package that indeed, MLE is difficult for 3-parameters function. I've had good success working with 2-params distributions that come with fitdistr and fitdistrplus, but as I try to use 3-parameters in additional packages (e.g. FAdist or qualityTools), it becomes very difficult. I keep getting error message concerning initial value in 'vmmin' is not finite, which as I understand should be solved by giving different starting parameters.

library(FAdist)
library(fitdistrplus)
data <- rgamma3(30, shape = 0.5, scale = 1e-4, thres=0)  # Using Gamma distribution for sample data
params = fitdistr(data,dgamma3, start=list(shape=0.5,scale=1e-4, thres=0))

Maybe there is something simple I'm missing, I will simply wrap up the PR now

coxipi added 3 commits February 16, 2024 17:20

_fit_start for gamma, fisk can accept negative values

2f8c293

remove offset, standardize_index as general-use func

c3e4ff9

add context="hydro"

2294163

github-actions bot added the indicators Climate indices and indicators label Feb 19, 2024

coxipi added 5 commits February 19, 2024 09:25

update CHANGES

36d3ce7

Keep code compatible with offset, add deprecation warnings

6a1809c

offset=0 by default -> produce behaviour of future xclim versions

a442fcc

MOre details in CHANGES, breaking

915e955

Merge branch 'master' of https://github.com/Ouranosinc/xclim into spi…

7c89f33

…_loc

Zeitsperre reviewed Feb 22, 2024

View reviewed changes

CHANGES.rst Outdated Show resolved Hide resolved

Merge branch 'master' of https://github.com/Ouranosinc/xclim into spi…

c2f020d

…_loc

coxipi added 2 commits February 22, 2024 17:05

update version in CHANGES

2c4a684

remove entries in old versions, CHANGES

be11a47

coxipi and others added 2 commits March 20, 2024 16:12

better estimators using method of moments for 2-p fisk

b989dbf

Merge branch 'main' into spi_loc

e708f85

coxipi added 3 commits March 22, 2024 17:50

add loc estimates in _fit_start for 'gamma' and 'fisk'

52bff86

Merge branch 'spi_loc' of https://github.com/Ouranosinc/xclim into sp…

7ed131c

…i_loc

APP method can only be used with floc=0

b20d7a9

VascoSch92 reviewed Apr 16, 2024

View reviewed changes

CHANGES.rst Outdated Show resolved Hide resolved

coxipi and others added 2 commits April 16, 2024 15:57

Update CHANGES.rst

30ab018

missing 's' Co-authored-by: Vasco Schiavo <[email protected]>

Merge branch 'main' into spi_loc

8935f25

coxipi closed this Apr 18, 2024

coxipi mentioned this pull request Apr 18, 2024

Proper use of loc parameter in gamma,fisk dists (SPI/SPEI) #1720

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spei, proper use of `loc` parameter in scipy dists #1653

Spei, proper use of `loc` parameter in scipy dists #1653

coxipi commented Feb 19, 2024 •

edited

Loading

coxipi commented Feb 22, 2024 •

edited

Loading

coxipi commented Mar 15, 2024

coxipi commented Mar 22, 2024 •

edited

Loading

coxipi commented Mar 26, 2024

coxipi commented Apr 15, 2024 •

edited

Loading

coxipi commented Apr 16, 2024 •

edited

Loading

coxipi commented Apr 18, 2024

coxipi commented Apr 18, 2024

Spei, proper use of loc parameter in scipy dists #1653

Spei, proper use of loc parameter in scipy dists #1653

Conversation

coxipi commented Feb 19, 2024 • edited Loading

Pull Request Checklist:

What kind of change does this PR introduce?

Does this PR introduce a breaking change?

Other information:

coxipi commented Feb 22, 2024 • edited Loading

coxipi commented Mar 15, 2024

coxipi commented Mar 22, 2024 • edited Loading

coxipi commented Mar 26, 2024

coxipi commented Apr 15, 2024 • edited Loading

coxipi commented Apr 16, 2024 • edited Loading

doy=361 (the 0.104 vs. 0.147 spi values above):

another random doy I took, 180

The worst doy mentioned above, the one with 0.68 vs. 0.45

BONUS: using scipy.stats.fit instead of scipy.stats.fisk.fit

coxipi commented Apr 18, 2024

coxipi commented Apr 18, 2024

Spei, proper use of `loc` parameter in scipy dists #1653

Spei, proper use of `loc` parameter in scipy dists #1653

coxipi commented Feb 19, 2024 •

edited

Loading

coxipi commented Feb 22, 2024 •

edited

Loading

coxipi commented Mar 22, 2024 •

edited

Loading

coxipi commented Apr 15, 2024 •

edited

Loading

coxipi commented Apr 16, 2024 •

edited

Loading

BONUS: using `scipy.stats.fit` instead of `scipy.stats.fisk.fit`