Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a trigger warning and update the indices tutorial #53

Merged
merged 4 commits into from
Sep 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions doc/citing.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _citing:

Citing
======
Citing xlandsat
===============

This is research software **made by scientists**. Citations help us justify the
effort that goes into building and maintaining this project.
Expand Down
217 changes: 191 additions & 26 deletions doc/indices.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _indices:

Indices
-------
Working with indices
--------------------

Indices calculated from multispectral satellite imagery are powerful ways to
quantitatively analyze these data.
Expand All @@ -10,70 +10,133 @@ differentiate between them.
Many of these indices can be calculated with simple arithmetic operations.
So now that our data are in :class:`xarray.Dataset`'s, it's fairly easy to
calculate them.
As an example, we'll use two example scenes from before and after the
`Brumadinho tailings dam disaster <https://en.wikipedia.org/wiki/Brumadinho_dam_disaster>`__
to try to image and quantify the total area flooded by the damn collapse.

As an example, let's load two example scenes from before and after the
`Brumadinho tailings dam disaster <https://en.wikipedia.org/wiki/Brumadinho_dam_disaster>`__:
.. admonition:: Trigger warning
:class: warning

This tutorial uses data from the tragic
`Brumadinho tailings dam disaster <https://en.wikipedia.org/wiki/Brumadinho_dam_disaster>`__,
in which over 250 people lost their lives. We use this dataset to
illustrate the usefulness of remote sensing data for monitoring such
disasters but we want to acknowledge its tragic human consequences.
**Some readers may find this topic disturbing and may not wish to read
futher.**

First, we must import the required packages, download our two sample scenes,
and load them with :func:`xlandsat.load_scene`:

.. jupyter-execute::

import xlandsat as xls
import matplotlib.pyplot as plt


path_before = xls.datasets.fetch_brumadinho_before()
path_after = xls.datasets.fetch_brumadinho_after()

before = xls.load_scene(path_before)
after = xls.load_scene(path_after)
after

Let's make RGB composites to get a sense of what these
two scenes contain:

.. jupyter-execute::

rgb_before = xls.composite(before, rescale_to=(0.03, 0.2))
rgb_after = xls.composite(after, rescale_to=(0.03, 0.2))

fig, axes = plt.subplots(2, 1, figsize=(10, 12), layout="tight")
for ax, rgb in zip(axes, [rgb_before, rgb_after]):
rgb.plot.imshow(ax=ax)
ax.set_title(rgb.attrs["title"])
ax.set_aspect("equal")
plt.show()

The dam is located at around 592000 east and -2225000 north. The after scene
clearly shows all of the red mud that flooded the region to the southwest of
the dam. Notice also the red tinge of the Paraopeba River in the after image
as it was contaminated by the mud flow.

.. tip::

See :ref:`composites` for more information on what we did above.

NDVI
----

We can calculate the
`NDVI <https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index>`__
for these scenes to see if we can isolate the effect of the flood following the
dam collapse:
dam collapse.
NDVI highlights vegetation, which we assume will have decreased in the after
scene due to the flood.
NDVI is defined as:

.. math::

.. jupyter-execute::
NDVI = \dfrac{NIR - Red}{NIR + Red}

ndvi_before = (before.nir - before.red) / (before.nir + before.red)
ndvi_after = (after.nir - after.red) / (after.nir + after.red)
which we can calculate with xarray as:

# Set some metadata for xarray to find
ndvi_before.attrs["long_name"] = "normalized difference vegetation index"
ndvi_before.attrs["units"] = "dimensionless"
ndvi_after.attrs["long_name"] = "normalized difference vegetation index"
ndvi_after.attrs["units"] = "dimensionless"
.. jupyter-execute::

ndvi_before = (before.nir - before.red) / (before.nir + before.red)
ndvi_before

And now we can make pseudo-color plots of the NDVI:
Now we can do the same for the after scene:

.. jupyter-execute::

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 12))
ndvi_after = (after.nir - after.red) / (after.nir + after.red)
ndvi_after

# Limit the scale to [-1, +1] so the plots are easier to compare
ndvi_before.plot(ax=ax1, vmin=-1, vmax=1, cmap="RdBu_r")
ndvi_after.plot(ax=ax2, vmin=-1, vmax=1, cmap="RdBu_r")
And add some metadata for xarray to find when making plots:

ax1.set_title(f"Before: {before.attrs['title']}")
ax2.set_title(f"After: {after.attrs['title']}")
.. jupyter-execute::

ax1.set_aspect("equal")
ax2.set_aspect("equal")
for ndvi in [ndvi_before, ndvi_after]:
ndvi.attrs["long_name"] = "normalized difference vegetation index"
ndvi.attrs["units"] = "dimensionless"
ndvi_before.attrs["title"] = "NDVI before"
ndvi_after.attrs["title"] = "NDVI after"

Now we can make pseudo-color plots of the NDVI from before and after the
disaster:

.. jupyter-execute::

fig, axes = plt.subplots(2, 1, figsize=(10, 12), layout="tight")
for ax, ndvi in zip(axes, [ndvi_before, ndvi_after]):
# Limit the scale to [-1, +1] so the plots are easier to compare
ndvi.plot(ax=ax, vmin=-1, vmax=1, cmap="RdBu_r")
ax.set_title(ndvi.attrs["title"])
ax.set_aspect("equal")
plt.show()

Finally, we can calculate the change in NDVI from one scene to the other by
taking the difference:

Tracking differences
--------------------

An advantage of having our data in :class:`xarray.DataArray` format is that we
can perform **coordinate-aware** calculations. This means that taking the
difference between our two arrays will take into account the coordinates of
each pixel and only perform the operation where the coordinates align.

We can calculate the change in NDVI from one scene to the other by taking the
difference:

.. jupyter-execute::

ndvi_change = ndvi_before - ndvi_after

# Add som metadata for xarray
ndvi_change.name = "ndvi_change"
ndvi_change.attrs["long_name"] = (
ndvi_change.attrs["long_name"] = "NDVI change"
ndvi_change.attrs["title"] = (
f"NDVI change between {before.attrs['date_acquired']} and "
f"{after.attrs['date_acquired']}"
)
Expand All @@ -88,16 +151,118 @@ taking the difference:
this case, there was an East-West shift between scenes but our calculations
take that into account.

Now lets plot it:
Now lets plot the difference:

.. jupyter-execute::


fig, ax = plt.subplots(1, 1, figsize=(10, 6))
ndvi_change.plot(ax=ax, vmin=-1, vmax=1, cmap="PuOr")
ax.set_aspect("equal")
ax.set_title(ndvi_change.attrs["title"])
plt.show()

There's some noise in the cloudy areas of both scenes in the northeast but
otherwise this plots highlights the area affected by flooding from the dam
collapse in purple at the center.


Estimating area
---------------

One things we can do with indices and their differences in time is calculated
**area estimates**. If we know that the region of interest has index values
within a given value range, the area can be calculated by counting the number
of pixels within that range (a pixel in Landsat 8/9 scenes is 30 x 30 = 900 m²).

First, let's slice our NDVI difference to just the flooded area to avoid the
effect of the clouds in North. We'll use the :meth:`xarray.DataArray.sel`
method to slice using the UTM coordinates of the scene:

.. jupyter-execute::

flood = ndvi_change.sel(
easting=slice(587000, 594000),
northing=slice(-2230000, -2225000),
)

fig, ax = plt.subplots(1, 1, figsize=(10, 6))
flood.plot(ax=ax, vmin=-1, vmax=1, cmap="PuOr")
ax.set_aspect("equal")
plt.show()

Now we can create a mask of the flood area by selecting pixels that have a high
NDVI difference. Using a ``>`` comparison (or any other logical operator in
Python), we can create a boolean (``True`` or ``False``)
:class:`xarray.DataArray` as our mask:

.. jupyter-execute::

# Threshold value determined by trial-and-error
flood_mask = flood > 0.3

# Add some metadata for xarray
flood_mask.attrs["long_name"] = "flood mask"

flood_mask

Plotting boolean arrays will use 1 to represent ``True`` and 0 to represent
``False``:

.. jupyter-execute::

fig, ax = plt.subplots(1, 1, figsize=(10, 6))
flood_mask.plot(ax=ax, cmap="gray")
ax.set_aspect("equal")
ax.set_title("Flood mask")
plt.show()

.. seealso::

Notice that our mask isn't perfect. There are little bloobs classified as
flood pixels that are clearly outside the flood region. For more
sophisticated analysis, see the image segmentation methods in
`scikit-image <https://scikit-image.org/>`__.

Counting the number of ``True`` values is as easy as adding all of the boolean
values (remember that ``True`` corresponds to 1 and ``False`` to 0), which
we'll do with :meth:`xarray.DataArray.sum`:

.. jupyter-execute::

flood_pixels = flood_mask.sum().values
print(flood_pixels)

.. note::

We use ``.values`` above because :meth:`~xarray.DataArray.sum` returns an
:class:`xarray.DataArray` with a single value instead of the actual number.
This is usually not a problem but it looks ugly when printed, so we grab
the number with ``.values``.

Finally, the flood area is the number of pixels multiplied by the area of each
pixel (30 x 30 m²):

.. jupyter-execute::

flood_area = flood_pixels * 30**2

print(f"Flooded area is approximately {flood_area:.0f} m²")

Values in m² are difficult to imagine so a good way to communicate these
numbers is to put them into real-life context. In this case, we can use the
`football pitches <https://en.wikipedia.org/wiki/Football_pitch>`__ as a unit
that many people will understand:

.. jupyter-execute::

flood_area_pitches = flood_area / 7140

print(f"Flooded area is approximately {flood_area_pitches:.0f} football pitches")

.. warning::

**This is a very rough estimate!** The final value will vary greatly if you
change the threshold used to generate the mask (try it yourself).
For a more thorough analysis of the disaster using remote-sensing data, see
`Silva Rotta et al. (2020) <https://doi.org/10.1016/j.jag.2020.102119>`__.
Loading