Add GlacierMappingAlps dataset #2508

dcodrut · 2025-01-09T13:21:51Z

This PR adds a a Multi-modal Dataset for Glacier Mapping (Segmentation) in the European Alps.

The dataset consists of Sentinel-2 images from 2015 (mainly), 2016 and 2017, and binary segmentation masks for glaciers, based on an inventory built by glaciology experts (Paul et al. 2020).
Given that glacier ice is not always visible in the images, due to seasonal snow, shadow/cloud cover and, most importantly, debris cover, the dataset also includes additional features that can help in the segmentation task.

Here's a sample extracted with the plot function implemented in the dataset:

A preprint is available here that describes in detail how the dataset was constructed.
For a shorter description check also: https://huggingface.co/datasets/dcodrut/dl4gam_alps.

nilsleh

Thanks a lot for the contribution :) I hope this first round of review is able to resolve the import xarray error that is preventing the unit tests from running. Once they are working, will take another look for anything else!

torchgeo/datasets/glacier_mapping_alps.py

dcodrut · 2025-01-09T22:18:07Z

@microsoft-github-policy-service agree

dcodrut · 2025-01-09T22:30:11Z

Thanks a lot for the contribution :) I hope this first round of review is able to resolve the import xarray error that is preventing the unit tests from running. Once they are working, will take another look for anything else!

I hope it will be somehow useful. Happy to contribute and learn new things on the way :)
Thanks for the comments and please let me know if there are additional things I should change.

adamjstewart · 2025-01-11T18:30:27Z

I'm completely fine with adding xarray and netcdf4 as optional dependencies. We've been talking about adding xarray support for a long time. We're still not sure if it will be directly through xarray or something like rioxarray, but that doesn't matter for this PR.

adamjstewart · 2025-01-11T18:30:41Z

Can you resolve the merge conflicts so the tests run?

torchgeo/datasets/glacier_mapping_alps.py

adamjstewart · 2025-01-15T15:51:33Z

Can you resolve the merge conflicts? These requirements files unfortunately get updated a lot, causing merge issues.

adamjstewart

Code looks fantastic, just a few minor formatting comments. Thanks for taking the time to contribute your dataset to TorchGeo, let's hope it makes it easier for people to actually use and do cool science with!

adamjstewart · 2025-01-15T15:53:12Z

requirements/datasets.txt

@@ -7,3 +7,5 @@ pycocotools==2.0.8
 pyvista==0.44.2
 scikit-image==0.25.0
 scipy==1.15.0
+xarray==2024.11.0
+netcdf4==1.7.2


Let's keep these in alphabetical order

adamjstewart · 2025-01-15T15:55:21Z

requirements/datasets.txt

These also need to be added to pyproject.toml. We'll need to find minimum versions that allow the tests to pass. I can help with this if you want.

adamjstewart · 2025-01-15T16:00:59Z

torchgeo/datasets/glacier_mapping_alps.py

+)
+
+
+class GlacierMappingAlps(NonGeoDataset):


Should we rename the dataset to DL4GAM or DL4GAMAlps? Or is DL4GAM the project/model and GlacierMappingAlps is the name of the dataset? It's up to you, it's your dataset.

adamjstewart · 2025-01-15T16:01:41Z

torchgeo/datasets/glacier_mapping_alps.py

+class GlacierMappingAlps(NonGeoDataset):
+    r"""A Multi-modal Dataset for Glacier Mapping (Segmentation) in the European Alps.
+
+    The dataset consists of Sentinel-2 images from 2015 (mainly), 2016 and 2017, and binary segmentation masks for


Would be nice if we could limit all lines to 88 chars, but if ruff doesn't complain it isn't technically required

adamjstewart · 2025-01-15T16:02:50Z

torchgeo/datasets/glacier_mapping_alps.py

+        * `xarray <https://docs.xarray.dev/en/stable/getting-started-guide/installing.html>`_
+        * `netcdf4 <https://unidata.github.io/netcdf4-python/>`_


I usually just link to the PyPI homepage

adamjstewart · 2025-01-15T16:05:20Z

torchgeo/datasets/glacier_mapping_alps.py

+            checksum: if True, check the MD5 of the downloaded files (may be slow)
+
+        Raises:
+            AssertionError: if the ``split``, ``cv_iter``, ``version``, ``bands`` or ``extra_features`` are invalid


I usually use *foo* instead of backticks because that's what the Python docs uses. Could also be less specific and just say "if any parameters are invalid".

adamjstewart · 2025-01-15T16:06:18Z

torchgeo/datasets/glacier_mapping_alps.py

+        return len(self.fp_patches)
+
+    def __getitem__(self, index: int) -> dict[str, Tensor]:
+        """It loads the netcdf file for the given index and returns the sample as a dict.


Suggested change

"""It loads the netcdf file for the given index and returns the sample as a dict.

"""Load the NetCDF file for the given index and return the sample as a dict.

adamjstewart · 2025-01-15T16:06:50Z

torchgeo/datasets/glacier_mapping_alps.py

+                * the cloud and shadow mask
+                * the additional features (DEM, derived features, etc.) if required
+        """
+        xr = lazy_import('xarray')


For other datasets, we added a lazy_import to the __init__ just so that the dataset would fail more quickly.

adamjstewart · 2025-01-15T16:07:26Z

torchgeo/datasets/glacier_mapping_alps.py

+        version: str = 'small',
+        bands: Sequence[str] = rgb_nir_swir_bands,
+        extra_features: Sequence[str] | None = None,
+        transforms: Callable[[dict[str, Tensor]], dict[str, Tensor]] | None = None,


These aren't currently being used in __getitem__

adamjstewart · 2025-01-15T16:09:23Z

docs/api/datasets/non_geo_datasets.csv

@@ -21,6 +21,7 @@ Dataset,Task,Source,License,# Samples,# Classes,Size (px),Resolution (m),Bands
 `Forest Damage`_,OD,Drone imagery,"CDLA-Permissive-1.0","1,543",4,"1,500x1,500",,RGB
 `GeoNRW`_,S,Aerial,"CC-BY-4.0","7,783",11,"1,000x1,000",1,"RGB, DEM"
 `GID-15`_,S,Gaofen-2,-,150,15,"6,800x7,200",3,RGB
+`Glacier Mapping Alps`_,S,"Sentinel-2","CC-BY-4.0","2,251 or 11,440","2","256x256","10","MSI"


Confirmed the license

dcodrut added 5 commits January 9, 2025 13:26

add the GlacierMappingAlps dataset class

6eb701d

add doc entry

6c986a1

add dummy data script

c959d46

add dummy data

0e919c6

unit tests

6fa50bc

github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets testing Continuous integration testing labels Jan 9, 2025

nilsleh requested changes Jan 9, 2025

View reviewed changes

nilsleh added this to the 0.7.0 milestone Jan 9, 2025

dcodrut added 2 commits January 9, 2025 22:50

fix doc strings; refactoring

871d080

lazy import for xarray

5d7b27a

github-actions bot added the dependencies Packaging and dependencies label Jan 9, 2025

dcodrut and others added 2 commits January 14, 2025 03:03

Merge branch 'main' into datasets/glaciers_alps

d5413cb

reformat

890af53

nilsleh reviewed Jan 14, 2025

View reviewed changes

torchgeo/datasets/glacier_mapping_alps.py Outdated Show resolved Hide resolved

fix typo

cb3a9bc

adamjstewart approved these changes Jan 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GlacierMappingAlps dataset #2508

Add GlacierMappingAlps dataset #2508

dcodrut commented Jan 9, 2025

nilsleh left a comment

dcodrut commented Jan 9, 2025

dcodrut commented Jan 9, 2025

adamjstewart commented Jan 11, 2025

adamjstewart commented Jan 11, 2025

adamjstewart commented Jan 15, 2025 •

edited

Loading

adamjstewart left a comment

adamjstewart Jan 15, 2025

adamjstewart Jan 15, 2025

adamjstewart Jan 15, 2025

adamjstewart Jan 15, 2025

adamjstewart Jan 15, 2025

adamjstewart Jan 15, 2025

adamjstewart Jan 15, 2025

adamjstewart Jan 15, 2025

adamjstewart Jan 15, 2025

adamjstewart Jan 15, 2025

		* `xarray <https://docs.xarray.dev/en/stable/getting-started-guide/installing.html>`_
		* `netcdf4 <https://unidata.github.io/netcdf4-python/>`_

	"""It loads the netcdf file for the given index and returns the sample as a dict.
	"""Load the NetCDF file for the given index and return the sample as a dict.

		)


		class GlacierMappingAlps(NonGeoDataset):

Add GlacierMappingAlps dataset #2508

Are you sure you want to change the base?

Add GlacierMappingAlps dataset #2508

Conversation

dcodrut commented Jan 9, 2025

nilsleh left a comment

Choose a reason for hiding this comment

dcodrut commented Jan 9, 2025

dcodrut commented Jan 9, 2025

adamjstewart commented Jan 11, 2025

adamjstewart commented Jan 11, 2025

adamjstewart commented Jan 15, 2025 • edited Loading

adamjstewart left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamjstewart commented Jan 15, 2025 •

edited

Loading