diff --git a/docs/index.rst b/docs/index.rst index 1a8f0058e6b..e91379d6f3d 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -30,6 +30,7 @@ torchgeo tutorials/getting_started tutorials/pytorch + tutorials/geospatial tutorials/custom_raster_dataset tutorials/transforms tutorials/indices diff --git a/docs/tutorials/geospatial.ipynb b/docs/tutorials/geospatial.ipynb new file mode 100644 index 00000000000..844ce5b8747 --- /dev/null +++ b/docs/tutorials/geospatial.ipynb @@ -0,0 +1,355 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "45973fd5-6259-4e03-9501-02ee96f3f870", + "metadata": {}, + "outputs": [], + "source": [ + "# Copyright (c) Microsoft Corporation. All rights reserved.\n", + "# Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "id": "9478ed9a", + "metadata": { + "id": "NdrXRgjU7Zih" + }, + "source": [ + "# Introduction to Geospatial Data\n", + "\n", + "_Written by: Adam J. Stewart_\n", + "\n", + "In this tutorial, we introduce the challenges of working with geospatial data, especially remote sensing imagery. This is not meant to discourage practicioners, but to elucidate why existing computer vision domain libraries like torchvision are insufficient for working with multispectral satellite imagery." + ] + }, + { + "cell_type": "markdown", + "id": "4cc902a5-0a06-4b02-af47-31b124da8193", + "metadata": {}, + "source": [ + "## Common Modalities\n", + "\n", + "Geospatial data come in a wide variety of common modalities. Below, we dive into each modality and discuss what makes it unique." + ] + }, + { + "cell_type": "markdown", + "id": "7d02bf4d-e979-4d41-bf70-e1b5a73bac2f", + "metadata": {}, + "source": [ + "### Tabular data\n", + "\n", + "Many geospatial datasets, especially those collected by in-situ sensors, are distributed in tabular format. For example, imagine weather or air quality stations that distribute example data like:\n", + "\n", + "| Latitude | Longitude | Temperature | Pressure | PM$_{2.5}$ | O$_3$ | CO |\n", + "| -------: | --------: | ----------: | -------: | ---------: | ----: | -----: |\n", + "| 40.7128 | 74.0060 | 1 | 1025 | 20.0 | 4 | 473.9 |\n", + "| 37.7749 | 122.4194 | 11 | 1021 | 21.4 | 6 | 1259.5 |\n", + "| ... | ... | ... | ... | ... | ... | ... |\n", + "| 41.8781 | 87.6298 | -1 | 1024 | 14.5 | 30 | - |\n", + "| 25.7617 | 80.1918 | 17 | 1026 | 5.0 | - | - |\n", + "\n", + "This kind of data is relatively easy to load and integrate into a machine learning pipeline. The following models work well for tabular data:\n", + "\n", + "* Multi-Layer Perceptrons (MLPs): for unstructured data\n", + "* Recurrent Neural Networks (RNNs): for time-series data\n", + "* Graph Neural Networks (GNNs): for ungridded geospatial data\n", + "\n", + "Note that it is not uncommon for there to be missing values (as is the case for air pollutants in some cities) due to missing or faulty sensors. Data imputation may be required to fill in these missing values. Also make sure all values are converted to a common set of units." + ] + }, + { + "cell_type": "markdown", + "id": "b0076503-57d4-4803-b7ba-dc6b96dd5cf8", + "metadata": {}, + "source": [ + "### Multispectral\n", + "\n", + "Although traditional computer vision datasets are typically restricted to red-green-blue (RGB) images, remote sensing satellites typically capture 3–15 different spectral bands with wavelengths far outside of the visible spectrum. Mathematically speaking, each image will be formatted as:\n", + "\n", + "$$ x \\in \\mathbb{R}^{C \\times H \\times W},$$\n", + "\n", + "where:\n", + "\n", + "* $C$ is the number of spectral bands (color channels),\n", + "* $H$ is the height of each image (in pixels), and\n", + "* $W$ is the width of each image (in pixels).\n", + "\n", + "Below, we see a false-color composite created using spectral channels outside of the visible spectrum (such as near-infrared):\n", + "\n", + "
\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "03de4b08-3941-4b76-9eb1-11a57a7c9684", + "metadata": {}, + "source": [ + "### Hyperspectral\n", + "\n", + "While multispectral images are often limited to 3–15 disjoint spectral bands, hyperspectral sensors capture hundreds of spectral bands to approximate the continuous color spectrum. These images often present a particular challenge to convolutional neural networks (CNNs) due to the sheer data volume, and require either small image patches (decreased $H$ and $W$) or dimensionality reduction (decreased $C$) in order to avoid out-of-memory errors on the GPU.\n", + "\n", + "Below, we see a hyperspectral data cube, with each color channel visualized along the $z$-axis:\n", + "\n", + "
\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "20c83407-0496-45af-b0e0-2c615b0d9a03", + "metadata": {}, + "source": [ + "### Radar\n", + "\n", + "Passive sensors (ones that do not emit light) are limited by daylight hours and cloud-free conditions. Active sensors such as radar emit polarized microwave pulses and measure the time it takes for the signal to reflect or scatter off of objects. This allows radar satellites to operate at night and in adverse weather conditions. The images captured by these sensors are stored as complex numbers, with a real (amplitude) and imaginary (phase) component, making it difficult to integrate them into machine learning pipelines.\n", + "\n", + "Radar is commonly used in meteorology (Doppler radar) and geophysics (ground penetrating radar). By attaching a radar antenna to a moving satellite, a larger effective aperature is created, increasing the spatial resolution of the captured image. This technique is known as synthetic aperature radar (SAR), and has many common applications in geodesy, flood mapping, and glaciology. Finally, by comparing the phases of multiple SAR snapshots of a single location at different times, we can analyze minute changes in surface elevation, in a technique known as Interferometric Synthetic Aperature Radar (InSAR). Below, we see an interferogram of earthquake deformation:\n", + "\n", + "
\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "53b27a38-64cc-43e7-91da-1244fb9dd416", + "metadata": {}, + "source": [ + "### Lidar\n", + "\n", + "Similar to radar, lidar is another active remote sensing method that replaces microwave pulses with lasers. By measuring the time it takes light to reflect off of an object and return to the sensor, we can generate a 3D point cloud mapping object structures. Mathematically, our dataset would then become:\n", + "\n", + "$$\\mathcal{D} = \\left\\{\\left(x^{(i)}, y^{(i)}, z^{(i)}\\right)\\right\\}_{i=1}^N$$\n", + "\n", + "This technology is frequently used in several different application domains:\n", + "\n", + "* Meteorology: clouds, aerosols\n", + "* Geodesy: surveying, archaeology\n", + "* Forestry: tree height, biomass density\n", + "\n", + "Below, we see a 3D point cloud captured for a city:\n", + "\n", + "
\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "da27ea73-2e3d-43d8-ba0b-cdac92ab2f81", + "metadata": {}, + "source": [ + "## Resolution\n", + "\n", + "Remote sensing data comes in a number of spatial, temporal, and spectral resolutions.\n", + "\n", + "
\n", + "Warning: In computer vision, resolution usually refers to the dimensions of an image (in pixels). In remote sensing, resolution instead refers to the dimensions of each pixel (in meters). Throughout this tutorial, we will use the latter definition unless otherwise specified.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "bcd51946-62eb-44db-bf3b-bd2df5308ab6", + "metadata": {}, + "source": [ + "### Spatial resolution\n", + "\n", + "Choosing the right data for your application is often controlled by the resolution of the imagery. Spatial resolution, also called ground sample distance (GSD), is the size of each pixel as measured on the Earth's surface. While the exact definitions change as satellites become better, approximate ranges of resolution include:\n", + "\n", + "| Category | Resolution | Examples |\n", + "| -------: | ---------: | :------: |\n", + "| Low resolution | > 30 m | MODIS (250 m–1 km), GOES-16 (500 m–2 km) |\n", + "| Medium resolution | 5–30 m | Sentinel-2 (10–60 m), Landsat-9 (15–100 m) |\n", + "| High resolution | 1–5 m | Planet Dove (3–5 m), RapidEye (5 m) |\n", + "| Very high resolution | < 1 m | Maxar WorldView-3 (0.3 m), QuickBird (0.6 m) |\n", + "\n", + "It is not uncommon for a single sensor to capture high resolution panchromatic bands, medium resolution visible bands, and low resolution thermal bands. It is also possible for pixels to be non-square, as is the case for OCO-2. All bands must be resampled to the same resolution for use in machine learning pipelines." + ] + }, + { + "cell_type": "markdown", + "id": "bce349d9-8b5e-48e4-800a-c2fbb1c343cb", + "metadata": {}, + "source": [ + "### Temporal resolution\n", + "\n", + "For time-series applications, it is also important to think about the repeat period of the satellite you want to use. Depending on the orbit of the satellite, imagery can be anywhere from biweekly (for polar, sun-synchronous orbits) to continuous (for geostationary orbits). The latter is common for global Earth observation missions, while the latter is common for weather and communications satellites. Below, we see an illustration of a geostationary orbit:\n", + "\n", + "
\n", + "\n", + "
\n", + "\n", + "Due to partial overlap in orbit paths and intermittent cloud cover, satellite image time series (SITS) are often of irregular length and irregular spacing. This can be especially challenging for naïve time-series models to handle." + ] + }, + { + "cell_type": "markdown", + "id": "6c278653-d14b-44c8-971b-9851b7515b0f", + "metadata": {}, + "source": [ + "### Spectral resolution\n", + "\n", + "It is also important to consider the spectral resolution of a sensor, including both the number of spectral bands and the bandwidth that is captured. Different downstream applications require different spectral bands, and there is often a tradeoff between additional spectral bands and higher spatial resolution. The following figure compares the wavelengths captured by sensors onboard different satellites:\n", + "\n", + "
\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "c79a050a-5389-4fb4-8501-3f1be067a166", + "metadata": {}, + "source": [ + "## Preprocessing\n", + "\n", + "Geospatial data also has unique preprocessing requirements that necessitate experience working with a variety of tools like GDAL, the geospatial data abstraction library. GDAL support ~160 raster drivers and ~80 vector drivers, allowing users to reproject, resample, and rasterize data from a variety of specialty file formats." + ] + }, + { + "cell_type": "markdown", + "id": "5cadbfeb-40da-455f-8b9a-bb5a983aaa8b", + "metadata": {}, + "source": [ + "### Reprojection\n", + "\n", + "The Earth is three dimensional, but images are two dimensional. This requires a *projection* to map the 3D surface onto a 2D image, and a *coordinate reference system* (CRS) to map each point back to a specific latitude/longitude. Below, we see examples of a few common projections:\n", + "\n", + "
\n", + "
\n", + " \n", + "
Mercator
\n", + "
\n", + "
\n", + "\n", + "
\n", + "
\n", + " \n", + "
Albers Equal Area
\n", + "
\n", + "
\n", + "\n", + "
\n", + "
\n", + " \n", + "
Interrupted Goode Homolosine
\n", + "
\n", + "
\n", + "\n", + "There are literally thousands of different projections out there, and every dataset (or even different images within a single dataset) can have different projections. Even if you correctly georeference images during indexing, if you forget to project them to a common CRS, you can end up with rotated images with nodata values around them, and the images will not be pixel-aligned.\n", + "\n", + "
\n", + "\n", + "
\n", + "\n", + "We can use a command like:\n", + "\n", + "```\n", + "$ gdalwarp -s_srs EPSG:5070 -t_srs EPSG:4326 src.tif dst.tif\n", + "```\n", + "\n", + "to reproject a file from one CRS to another." + ] + }, + { + "cell_type": "markdown", + "id": "b0362b1e-1c47-4884-a414-699db82acb6e", + "metadata": {}, + "source": [ + "### Resampling\n", + "\n", + "As previously mentioned, each dataset may have its own unique spatial resolution, and even separate bands (channels) in a single image may have different resolutions. All data (including input images and target masks for semantic segmentation) must be resampled to the same resolution. This can be done using GDAL like so:\n", + "\n", + "```\n", + "$ gdalwarp -tr 30 30 src.tif dst.tif\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "ddbd742a-11b4-4fb9-a27a-58f5f14e2982", + "metadata": {}, + "source": [ + "Just because two files have the same resolution does not mean that they have *target-aligned pixels* (TAP). Our goal is that every input pixel is perfectly aligned with every expected output pixel, but differences in geolocation can result in masks that are offset by half a pixel from the input image. We can ensure TAP by adding the `-tap` flag:\n", + "\n", + "```\n", + "$ gdalwarp -tr 30 30 -tap src.tif dst.tif\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "c0c499d9-e266-4619-863f-e416cc823c58", + "metadata": {}, + "source": [ + "### Rasterization\n", + "\n", + "Not all geospatial data is raster data. Many files come in vector format, including points, lines, and polygons.\n", + "\n", + "
\n", + "\n", + "
\n", + "\n", + "Of course, semantic segmentation requires these polygon masks to be converted to raster masks. This process is called rasterization, and can be performed like so:\n", + "\n", + "```\n", + "$ gdal_rasterize -tr 30 30 -a BUILDING_HEIGHT -l buildings buildings.shp buildings.tif\n", + "```\n", + "\n", + "Above, we set the resolution to 30 m/pixel and use the `BUILDING_HEIGHT` attribute of the `buildings` layer as the burn-in value.\n" + ] + }, + { + "cell_type": "markdown", + "id": "a3acc64e-8dc0-46b4-a677-ecb9723d4f56", + "metadata": {}, + "source": [ + "## Additional Reading\n", + "\n", + "Luckily, TorchGeo can handle most preprocessing for us. If you would like to learn more about working with geospatial data, including how to manually do the above tasks, the following additional reading may be useful:\n", + "\n", + "* [GDAL documentation](https://gdal.org/en/stable/index.html)\n", + "* [rasterio documentation](https://rasterio.readthedocs.io/en/stable/index.html)\n", + "* [Guide to GeoTIFF compression and optimization with GDAL](https://kokoalberti.com/articles/geotiff-compression-optimization-guide/)\n", + "* [A survival guide to Landsat preprocessing](https://doi.org/10.1002/ecy.1730)" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "getting_started.ipynb", + "provenance": [] + }, + "execution": { + "timeout": 1200 + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.0" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}