Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current issues with VI #44

Closed
bwbaker1 opened this issue Nov 25, 2024 · 4 comments
Closed

Current issues with VI #44

bwbaker1 opened this issue Nov 25, 2024 · 4 comments
Assignees

Comments

@bwbaker1
Copy link
Collaborator

bwbaker1 commented Nov 25, 2024

###Description:
We are dealing with a set of guidelines and constraints for calculating VI, and are trying to ensure that all pixels in the input dataset adhere to these rules. Here are key points of Junchang's findings.

  1. Zero Reflectance Should Be Filtered Out:
  • Any pixel with a reflectance value of zero in any band (blue, green, red, NIR, SWIR1, SWIR2) should be excluded from the calculation of VIs. This is because zero reflectance might indicate an issue with the data (e.g., a missing or erroneous value), and such values can distort the VI results.
  1. Negative or Zero Reflectance:
  • If a pixel has a negative or zero reflectance value in any of the input bands (blue, green, red, NIR, SWIR1, SWIR2), all VIs for that pixel should be assigned a No Data value. This rule ensures that you do not calculate indices for unreliable or problematic pixels.
  1. Consistent Rounding:
  • Python's round() function has unpredictable behavior with negative numbers (it can round towards or away from zero).
  • The suggested approach is to avoid using round() altogether and use floor(x + 0.5) instead, as this will consistently round numbers based on the floor function (rounding towards the nearest integer). This method avoids ambiguity by rounding to the nearest integer away from zero for positive numbers and towards zero for negative numbers, providing a more consistent result.
  1. Granule ID
  • Granule metadata should include the granule ID

Summary

  1. Zero reflectance should be filtered out.
  2. Pixels with negative or zero reflectance in any band should have NO_DATA for all VIs.
    1. do not use the round() function, but floor(x+0.5). Python round() is unpredictable.
  3. Include granule ID in metadata

Junchang's notes

@ceholden
Copy link
Collaborator

ceholden commented Dec 4, 2024

Hey @junchangju, thanks for identifying these issues and writing them up. I think the first two are very straightforward but I had one clarifying question about reflectances over 100%. I also have one question on the rounding issue you identified.

Filtering invalid pixels

Should we also mask values that are >100% reflectance in addition to <0% reflectance (negative values)? I'm assuming "no" but I wanted to double check before getting started.

Rounding

For rounding, I wanted to call out that we're not using Python's standard library's round() function but instead the rounding function from NumPy. NumPy's round function is not unpredictable, but rather it follows the "rounding half to even" approach that is "default" rounding method specified by IEEE 754 (e.g., per Intel) because it reduces bias. The alternative you proposed floor(x + 0.5) tends to be called "rounding half up".

The behavior you identified ("rounding to the nearest integer away from zero for positive numbers and towards zero for negative numbers") isn't quite accurate. The examples below show that it rounds up or down according to whatever produces the closest even integer, and using this method avoids introducing the bias that "rounding half up" introduces.

e.g.,

# two positive, two negative
>>> example = np.array([0.5, 1.5, -0.5, -1.5])
>>> example.mean()
np.float64(0.0)

# "rounding half to even"
>>> np.rint(example)
# first positive is rounded down to 0
# second positive is rounded up to 2
# first negative is rounded up to 0
# second negative is rounded down to -2
array([ 0.,  2., -0., -2.])
>>> np.rint(example).mean()
np.float64(0.0)  # no bias

# versus "rounding half up"
>>> np.floor(example + 0.5)
array([ 1.,  2.,  0., -1.])
>>> np.floor(example + 0.5).mean()
np.float64(0.5)  # bias

To use the examples from your doc with NBR values,

# I added a few values around the two NBR values given as examples
>>> nbr = np.array([-0.45625, -0.45615, -0.45635, -0.84375, -0.84365, -0.84385]) * 10_000
>>> np.round(nbr)
array([-4562., -4562., -4564., -8438., -8436., -8438.])

We can switch to "rounding half up" if that's the recommendation, but I wanted to double check after clarifying that NumPy's rounding behavior isn't unpredictable and is intended to avoid bias when rounding populations of numbers.

@ceholden
Copy link
Collaborator

ceholden commented Dec 4, 2024

[Reposting from Slack]

It looks like we're using -9999 as the "no data value" for vegetation indexes which stored after being scaled by 10,000. This puts the "no data value" inside the valid range of many vegetation indexes which range between [-1, 1] (or [-10,000, 10,000] when scaled). In other words, we might be declaring that a valid pixel is "no data" just because the value we computed was -9999

The Landsat NDVI product fixed this mistake in Collection 2 by setting the fill value to -19999 after having made this mistake in Collection 1 (nodata=-9999 originally), https://www.usgs.gov/landsat-missions/landsat-normalized-difference-vegetation-index

I've also done this in my past work 😄 and the impact is rare but does exist. I think we should also avoid this mistake and would suggest we select -19999 like the USGS did for their Landsat vegetation index product. Does that make sense as a change to adopt, or any other suggestions? I've also used the dtype maximum or minimum value before for vegetation index "nodata" values (e.g., 32,767)

@junchangju
Copy link

@ceholden Hi Chris,
Thank you for going through my numerical examples. You have raised excellent points and the one on rounding is educational to me.

  1. Let's keep >1.0 reflectance for now, but document it in the VI user's guide (I'll do it). The >1.0 reflectance is mostly over snow. It is a known issue and users will be wary about it. It can also be caused by topography because a flat surface is assumed in our methodology so far. In some rare cases an object could reflect much more strongly than an isotropic one to have a great than 1.0 reflectance. The latter two cases are valid.
  2. I accept your suggestion on changing the nodata value to -19999. -9999 is sufficient for reflectance, but not for VI for some rare instances.
  3. Thanks for sharing the knowledge on NumPy rounding. "Rounding half to even" makes sense for images because a huge number of pixels are concerned. The other team members will make sure to use NumPy for verification from now on.
  4. The overflow and underflow check I suggested is based on my imagination and I don't know if it will ever happen. But I think it is good to add it.

@ceholden
Copy link
Collaborator

ceholden commented Dec 5, 2024

Thanks for your reply @junchangju! I really appreciate the explanation about reflectances that are >100% and the cause being related to topography; it's something I've known about but didn't know the causes for. I also learned about rounding as part of this work

Specifically to reply to your points,

  1. +1 makes sense to me. Right now we are masking >100% reflectance (see line of code) in the reflectance bands before calculating vegetation indexes. This will need to change to avoid masking >100% reflectance and to mask 0 reflectance. Right now we would not mask 0% reflectance, but we should instead mask anything below the lowest possible value (1 in scaled terms, or 1/10,000 = 0.0001)
  2. +1 thanks for checking! I'll update the fill value we're using to be -19,999 to keep outside of the [-10_000, 10_000] range
  3. +1 thanks for considering! I'll keep the rounding method and add a note in our code about the behavior. If you're writing in C you might look into the GNU Stats Library which supports a few rounding strategies and defaults to "round half to even", https://www.gnu.org/software/libc/manual/html_node/Rounding.html
  4. I'll look into how NumPy is helping us handle underflows & overflows. I agree this issue is probably not likely to happen, but I think there is one location in the code that we could change to ensure it never happens. Specifically I think we can try to avoid "wraparound" where large floating point values "wrap around" the bounds of an integer data type when recast. See the code sample below,

To avoid overflow/underflow related wraparounds,

# Large floating point values ~> unlikely but could happen
>>> large_floats = np.array([-1e100, 1e100], dtype="float64")
>>> large_floats
array([-1.e+100,  1.e+100])

# If we simply downcast these to integers the values will overflow and "wrap around"
# We get a warning for this but we aren't catching it right now
>>> large_floats.astype(np.int16)
RuntimeWarning: invalid value encountered in cast
  large_floats.astype(np.int16)
array([ 0, -1], dtype=int16)

# Instead we could first clip to the largest values supported by int16 before downcasting
>>> np.clip(large_floats, a_min=np.iinfo("int16").min, a_max=np.iinfo("int16").max).astype(np.int16)
array([-32768,  32767], dtype=int16)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants