diff --git a/docs/index.rst b/docs/index.rst index b9693ec905d..1a9f2f2bc22 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -31,6 +31,7 @@ torchgeo tutorials/getting_started tutorials/pytorch tutorials/geospatial + tutorials/torchgeo tutorials/contribute_non_geo_dataset tutorials/custom_raster_dataset tutorials/transforms diff --git a/docs/tutorials/torchgeo.ipynb b/docs/tutorials/torchgeo.ipynb new file mode 100644 index 00000000000..cc6e17c6170 --- /dev/null +++ b/docs/tutorials/torchgeo.ipynb @@ -0,0 +1,445 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "45973fd5-6259-4e03-9501-02ee96f3f870", + "metadata": {}, + "outputs": [], + "source": [ + "# Copyright (c) Microsoft Corporation. All rights reserved.\n", + "# Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "id": "9478ed9a", + "metadata": { + "id": "NdrXRgjU7Zih" + }, + "source": [ + "# Introduction to TorchGeo\n", + "\n", + "_Written by: Adam J. Stewart_\n", + "\n", + "Now that we've seen the basics of PyTorch and the challenges of working with geospatial data, let's see how TorchGeo addresses these challenges." + ] + }, + { + "cell_type": "markdown", + "id": "34f10e9f", + "metadata": { + "id": "lCqHTGRYBZcz" + }, + "source": [ + "## Setup\n", + "\n", + "First, we install TorchGeo and all of its dependencies." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "019092f0", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install torchgeo" + ] + }, + { + "cell_type": "markdown", + "id": "4db9f791", + "metadata": { + "id": "dV0NLHfGBMWl" + }, + "source": [ + "## Imports\n", + "\n", + "Next, we import TorchGeo and any other libraries we need." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3d92b0f1", + "metadata": { + "id": "entire-albania" + }, + "outputs": [], + "source": [ + "import os\n", + "import tempfile\n", + "from datetime import datetime\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from torch.utils.data import DataLoader\n", + "\n", + "from torchgeo.datasets import CDL, BoundingBox, Landsat7, Landsat8, stack_samples\n", + "from torchgeo.datasets.utils import download_and_extract_archive\n", + "from torchgeo.samplers import GridGeoSampler, RandomGeoSampler" + ] + }, + { + "cell_type": "markdown", + "id": "b813beba-62ad-430c-96e5-1d81bef1e244", + "metadata": {}, + "source": [ + "## Motivation\n", + "\n", + "Let's start with a common task in geospatial machine learning to motivate us: land cover mapping. Imagine you have a collection of imagery and a land cover layer or *mask* you would like to learn to predict. In machine learning, this pixelwise classification process is referred to as *semantic segmentation*.\n", + "\n", + "More concretely, imagine you would like to combine a set of Landsat 7 and 8 scenes with the Cropland Data Layer (CDL). This presents a number of challenges for a typical machine learning pipeline:\n", + "\n", + "* We may have hundreds of partially overlapping Landsat images that need to be mosaiced together\n", + "* We have a single CDL mask covering the entire continental US\n", + "* Neither the Landsat input or CDL output will have the same geospatial bounds\n", + "* Landsat is multispectral, and may have a different resolution for each spectral band\n", + "* Landsat 7 and 8 have a different number of spectral bands\n", + "* Landsat and CDL may have a differerent CRS\n", + "* Every single Landsat file may be in a different CRS (e.g., multiple UTM zones)\n", + "* We may have multiple years of input and output data, and need to ensure matching time spans\n", + "\n", + "We can't have a dataset of length 1, and it isn't obvious what to do when the number, bounds, and size of input images differ from the output masks. Furthermore, each image is far too large to pass to a neural network. \n", + "\n", + "Traditionally, people either performed classification on a single pixel at a time or curated their own benchmark dataset. This works fine for training, but isn't really useful for inference. What we would really like to be able to do is sample small pixel-aligned pairs of input images and output masks from the region of overlap between both datasets. This exact situation is illustrated in the following figure:\n", + "\n", + "![Landsat CDL intersection](https://github.com/microsoft/torchgeo/blob/main/images/geodataset.png?raw=true)\n", + "\n", + "Now, let's see what features TorchGeo has to support this kind of use case." + ] + }, + { + "cell_type": "markdown", + "id": "41119706-0722-4fd0-85a7-787bb12bbab8", + "metadata": {}, + "source": [ + "## Datasets\n", + "\n", + "Geospatial data comes in a wide variety of formats. TorchGeo has two separate classes of datasets to deal with this dataset diversity:\n", + "\n", + "* `NonGeoDataset`: for curated benchmark datasets, where geospatial metadata is either missing or unnecessary\n", + "* `GeoDataset`: for uncurated raster and vector data layers, where geospatial metadata is critical for merging datasets\n", + "\n", + "We have already seen the former in the Introduction to PyTorch tutorial, as `EuroSAT100` is a subclass of `NonGeoDataset`. In this tutorial, we will focus on the latter and its advantages for working with uncurated data." + ] + }, + { + "cell_type": "markdown", + "id": "914b39c6-3373-4ae8-b9ea-d377e73e9fbe", + "metadata": {}, + "source": [ + "### Landsat\n", + "\n", + "First, let's start with our Landsat imagery. We will download a couple of Landsat 7 and 8 scenes, then pass them to builtin TorchGeo datasets for each." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "48d2a61d-16bb-4809-9da0-3bd369bff070", + "metadata": {}, + "outputs": [], + "source": [ + "landsat_root = os.path.join(tempfile.gettempdir(), 'landsat')\n", + "\n", + "url = 'https://hf.co/datasets/torchgeo/tutorials/resolve/ff30b729e3cbf906148d69a4441cc68023898924/'\n", + "landsat7_url = url + 'LE07_L2SP_022032_20230725_20230820_02_T1.tar.gz'\n", + "landsat8_url = url + 'LC08_L2SP_023032_20230831_20230911_02_T1.tar.gz'\n", + "\n", + "download_and_extract_archive(landsat7_url, landsat_root)\n", + "download_and_extract_archive(landsat8_url, landsat_root)\n", + "\n", + "landsat7_bands = ['SR_B1', 'SR_B2', 'SR_B3', 'SR_B4', 'SR_B5', 'SR_B7']\n", + "landsat8_bands = ['SR_B2', 'SR_B3', 'SR_B4', 'SR_B5', 'SR_B6', 'SR_B7']\n", + "\n", + "landsat7 = Landsat7(paths=landsat_root, bands=landsat7_bands)\n", + "landsat8 = Landsat8(paths=landsat_root, bands=landsat8_bands)\n", + "\n", + "print(landsat7)\n", + "print(landsat8)\n", + "\n", + "print(landsat7.crs)\n", + "print(landsat8.crs)" + ] + }, + { + "cell_type": "markdown", + "id": "ce12838a-1010-46cb-bcca-6379f9e327ac", + "metadata": {}, + "source": [ + "The following details are worth noting:\n", + "\n", + "* We ignore the \"coastal blue\" band of Landsat 8 because it does not exist in Landsat 7\n", + "* Even though all files are stored in the same directory, the datasets know which files to include\n", + "* `paths` can be a directory to recursively search, a list of local files, or even a list of remote cloud assets" + ] + }, + { + "cell_type": "markdown", + "id": "a51c5df2-5543-41ae-a9cf-254e29b6bdfd", + "metadata": {}, + "source": [ + "### CDL\n", + "\n", + "Next, let's do the same for the CDL dataset. We are using a smaller cropped version of this dataset to make the download faster." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "909d233b-b212-48f1-b910-3065f8fcf083", + "metadata": {}, + "outputs": [], + "source": [ + "cdl_root = os.path.join(tempfile.gettempdir(), 'cdl')\n", + "\n", + "cdl_url = url + '2023_30m_cdls.zip'\n", + "\n", + "download_and_extract_archive(cdl_url, cdl_root)\n", + "\n", + "cdl = CDL(paths=cdl_root)\n", + "\n", + "print(cdl)\n", + "print(cdl.crs)" + ] + }, + { + "cell_type": "markdown", + "id": "571a6512-494f-401a-bf2f-599f28b2fad5", + "metadata": {}, + "source": [ + "Again, the following details are worth noting:\n", + "\n", + "* We could actually ask the `CDL` dataset to download our data for us by adding `download=True`\n", + "* All datasets have different spatial extents\n", + "* All datasets have different CRSs" + ] + }, + { + "cell_type": "markdown", + "id": "4a15b938-3277-46bc-86e4-a5d7f57e838a", + "metadata": {}, + "source": [ + "### Composing datasets\n", + "\n", + "We would like to be able to intelligently combine all three datasets in order to train a land cover mapping model. This requires us to create a virtual mosaic of all Landsat scenes, regardless of overlap. This can be done by taking the *union* of both datasets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4b5adace-d7c9-4c27-9e53-ae532b081046", + "metadata": {}, + "outputs": [], + "source": [ + "landsat = landsat7 | landsat8\n", + "print(landsat)\n", + "print(landsat.crs)" + ] + }, + { + "cell_type": "markdown", + "id": "ddac6f18-36de-4241-a150-0ee50d0f40dd", + "metadata": {}, + "source": [ + "Similarly, we only want to sample from locations with both input imagery and output masks, not locations with only one or the other. We can achieve this by taking the *intersection* of both datasets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6dd9f067-0e00-47ac-8bc1-6e7cd9e41e4d", + "metadata": {}, + "outputs": [], + "source": [ + "dataset = landsat & cdl\n", + "print(dataset)\n", + "print(dataset.crs)" + ] + }, + { + "cell_type": "markdown", + "id": "48d2afbe-aab8-415e-a0df-fdb0d5209a49", + "metadata": {}, + "source": [ + "Note that all datasets now have the same CRS. When you run this code, you should notice it happen very quickly. TorchGeo hasn't actually created a mosaic yet or reprojected anything, it will do this on the fly for us." + ] + }, + { + "cell_type": "markdown", + "id": "4df7ee26-2c11-4e70-b113-e633fbbc2cd9", + "metadata": {}, + "source": [ + "### Spatiotemporal indexing\n", + "\n", + "How did we do this? TorchGeo uses a data structure called an *R-tree* to store the spatiotemporal bounding box of every file in the dataset. \n", + "\n", + "![R-tree](https://raw.githubusercontent.com/davidmoten/davidmoten.github.io/master/resources/rtree-3d/plot2.png)\n", + "\n", + "TorchGeo extracts the spatial bounding box from the metadata of each file, and the timestamp from the filename. This geospatial and geotemporal metadata allows us to efficiently compute the intersection or union of two datasets. It also lets us quickly retrieve an image and corresponding mask for a particular location in space and time." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3992c571-0a6f-4d28-a2dc-e5915c00901e", + "metadata": {}, + "outputs": [], + "source": [ + "size = 256\n", + "\n", + "xmin = 925000\n", + "xmax = xmin + size * 30\n", + "ymin = 4470000\n", + "ymax = ymin + size * 30\n", + "tmin = datetime(2023, 1, 1).timestamp()\n", + "tmax = datetime(2023, 12, 31).timestamp()\n", + "\n", + "bbox = BoundingBox(xmin, xmax, ymin, ymax, tmin, tmax)\n", + "sample = dataset[bbox]\n", + "\n", + "landsat8.plot(sample)\n", + "cdl.plot(sample)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "bc591543-6d74-47b3-8c24-feada66d0a38", + "metadata": {}, + "source": [ + "TorchGeo uses *windowed-reading* to only read the blocks of memory needed to load a small patch from a large raster tile. It also automatically reprojects all data to the same CRS and resolution (from the first dataset). This can be controlled by explicitly passing `crs` or `res` to the dataset." + ] + }, + { + "cell_type": "markdown", + "id": "e2e4221e-dfb7-4966-96a6-e52400ae266c", + "metadata": {}, + "source": [ + "## Samplers\n", + "\n", + "The above `BoundingBox` makes it easy to index into complex datasets consisting of hundreds of files. However, it is a bit cumbersome to manually construct these queries every time, especially if we want thousands or even millions of bounding boxes. Luckily, TorchGeo provides a `GeoSampler` class to construct these for us." + ] + }, + { + "cell_type": "markdown", + "id": "47a7423d-32a9-40ae-be62-d54805835b19", + "metadata": {}, + "source": [ + "### Random sampling\n", + "\n", + "Usually, at training time, we want the largest possible dataset we can muster. For curated benchmark datasets like `EuroSAT100`, we achieved this by applying data augmentation to artificially inflate the size and diversity of our dataset. For `GeoDataset` objects, we can achieve this using random sampling. It doesn't matter if two or more of our images have partial overlap, as long as they bring unique pixels that help our model learn. \n", + "\n", + "TorchGeo provides a `RandomGeoSampler` to achieve this. We just tell the sampler how large we want each image patch to be (in pixel coordinates or CRS units) and, optionally, the number of image patches per epoch." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36a60164-aa88-4773-a38f-d40960f4bfb2", + "metadata": {}, + "outputs": [], + "source": [ + "train_sampler = RandomGeoSampler(dataset, size=size, length=1000)\n", + "next(iter(train_sampler))" + ] + }, + { + "cell_type": "markdown", + "id": "b6d35b26-edae-46dc-b232-878421faa84d", + "metadata": {}, + "source": [ + "### Gridded sampling\n", + "\n", + "At evaluation time, this actually becomes a problem. We want to make sure we aren't making multiple predictions for the same location. We also want to make sure we don't miss any locations. To achieve this, TorchGeo also provides a `GridGeoSampler`. We can tell the sampler the size of each image patch and the stride of our sliding window." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33340c1a-756f-4ffe-ae3d-c2307fc98d07", + "metadata": {}, + "outputs": [], + "source": [ + "test_sampler = GridGeoSampler(dataset, size=size, stride=size)\n", + "next(iter(test_sampler))" + ] + }, + { + "cell_type": "markdown", + "id": "b9806919-6520-4da6-9eb3-3e1e6a10498e", + "metadata": {}, + "source": [ + "## Data Loaders\n", + "\n", + "All of these abstractions (`GeoDataset` and `GeoSampler`) are fully compatible with all of the rest of PyTorch. We can simply pass them to a data loader like below. Note that we also need the `stack_samples` collation function to convert a list of samples to a mini-batch." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fd44d29d-b7c0-4617-bb94-d41a14e8f54a", + "metadata": {}, + "outputs": [], + "source": [ + "train_dataloader = DataLoader(\n", + " dataset, batch_size=128, sampler=train_sampler, collate_fn=stack_samples\n", + ")\n", + "test_dataloader = DataLoader(\n", + " dataset, batch_size=128, sampler=test_sampler, collate_fn=stack_samples\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "e46e8453-df25-4265-a85b-75dce7dea047", + "metadata": {}, + "source": [ + "Now that we have working data loaders, we can copy-n-paste our training code from the Introduction to PyTorch tutorial. We only need to change our model to one designed for semantic segmentation, such as a U-Net. Every other line of code would be identical to how you would do this in your normal PyTorch workflow." + ] + }, + { + "cell_type": "markdown", + "id": "a3acc64e-8dc0-46b4-a677-ecb9723d4f56", + "metadata": {}, + "source": [ + "## Additional Reading\n", + "\n", + "TorchGeo has plenty of other tutorials and documentation. If you would like to get more insight into the design of TorchGeo, the following external resources are also helpful:\n", + "\n", + "* [TorchGeo: Deep Learning With Geospatial Data](https://arxiv.org/abs/2111.08872)\n", + "* [Geospatial deep learning with TorchGeo](https://pytorch.org/blog/geospatial-deep-learning-with-torchgeo/)" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "getting_started.ipynb", + "provenance": [] + }, + "execution": { + "timeout": 1200 + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.0" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}