Skip to content

Commit

Permalink
Add CaFFe Dataset (#2350)
Browse files Browse the repository at this point in the history
* glacier ds

* typo

* quick review

* ordinal class map and plotting

* rename to caffe

* more rename

* datamodule test without trainer

* forgot a rename

* test for dm

* test loader

* docs target dataset name

* mask class values

* requests and plotting colors
  • Loading branch information
nilsleh authored Oct 23, 2024
1 parent 607b3c5 commit 67fe198
Show file tree
Hide file tree
Showing 38 changed files with 561 additions and 0 deletions.
5 changes: 5 additions & 0 deletions docs/api/datamodules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,11 @@ CaBuAr

.. autoclass:: CaBuArDataModule

CaFFe
^^^^^

.. autoclass:: CaFFeDataModule

ChaBuD
^^^^^^

Expand Down
5 changes: 5 additions & 0 deletions docs/api/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,11 @@ CaBuAr

.. autoclass:: CaBuAr

CaFFe
^^^^^

.. autoclass:: CaFFe

ChaBuD
^^^^^^

Expand Down
1 change: 1 addition & 0 deletions docs/api/datasets/non_geo_datasets.csv
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Dataset,Task,Source,License,# Samples,# Classes,Size (px),Resolution (m),Bands
`BigEarthNet`_,C,Sentinel-1/2,"CDLA-Permissive-1.0","590,326",19--43,120x120,10,"SAR, MSI"
`BioMassters`_,R,Sentinel-1/2 and Lidar,"CC-BY-4.0",,,256x256, 10, "SAR, MSI"
`CaBuAr`_,CD,Sentinel-2,"OpenRAIL",424,2,512x512,20,MSI
`CaFFe`_,S,"Sentinel-1, TerraSAR-X, TanDEM-X, ENVISAT, ERS-1/2, ALOS PALSAR, and RADARSAT-1","CC-BY-4.0","19092","2 or 4","512x512",6-20,"SAR"
`ChaBuD`_,CD,Sentinel-2,"OpenRAIL",356,2,512x512,10,MSI
`Cloud Cover Detection`_,S,Sentinel-2,"CC-BY-4.0","22,728",2,512x512,10,MSI
`COWC`_,"C, R","CSUAV AFRL, ISPRS, LINZ, AGRC","AGPL-3.0-only","388,435",2,256x256,0.15,RGB
Expand Down
Binary file added tests/data/caffe/caffe.zip
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
80 changes: 80 additions & 0 deletions tests/data/caffe/data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
#!/usr/bin/env python3

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

import hashlib
import os
import shutil

import numpy as np
from PIL import Image

# Define the root directory and subdirectories
root_dir = 'caffe'
sub_dirs = ['zones', 'sar_images', 'fronts']
splits = ['train', 'val', 'test']

zone_file_names = [
'Crane_2002-11-09_ERS_20_2_061_zones__93_102_0_0_0.png',
'Crane_2007-09-22_ENVISAT_20_1_467_zones__93_102_8_1024_0.png',
'JAC_2015-12-23_TSX_6_1_005_zones__57_49_195_384_1024.png',
]

IMG_SIZE = 32


# Function to create dummy images
def create_dummy_image(path: str, shape: tuple[int], pixel_values: list[int]) -> None:
data = np.random.choice(pixel_values, size=shape, replace=True).astype(np.uint8)
img = Image.fromarray(data)
img.save(path)


def create_zone_images(split: str, filename: str) -> None:
zone_pixel_values = [0, 64, 127, 254]
path = os.path.join(root_dir, 'zones', split, filename)
create_dummy_image(path, (IMG_SIZE, IMG_SIZE), zone_pixel_values)


def create_sar_images(split: str, filename: str) -> None:
sar_pixel_values = range(256)
path = os.path.join(root_dir, 'sar_images', split, filename)
create_dummy_image(path, (IMG_SIZE, IMG_SIZE), sar_pixel_values)


def create_front_images(split: str, filename: str) -> None:
front_pixel_values = [0, 255]
path = os.path.join(root_dir, 'fronts', split, filename)
create_dummy_image(path, (IMG_SIZE, IMG_SIZE), front_pixel_values)


if os.path.exists(root_dir):
shutil.rmtree(root_dir)

# Create the directory structure
for sub_dir in sub_dirs:
for split in splits:
os.makedirs(os.path.join(root_dir, sub_dir, split), exist_ok=True)

# Create dummy data for all splits and filenames
for split in splits:
for filename in zone_file_names:
create_zone_images(split, filename)
create_sar_images(split, filename.replace('_zones_', '_'))
create_front_images(split, filename.replace('_zones_', '_front_'))

# zip and compute md5
shutil.make_archive(root_dir, 'zip', '.', root_dir)


def md5(fname: str) -> str:
hash_md5 = hashlib.md5()
with open(fname, 'rb') as f:
for chunk in iter(lambda: f.read(4096), b''):
hash_md5.update(chunk)
return hash_md5.hexdigest()


md5sum = md5('caffe.zip')
print(f'MD5 checksum: {md5sum}')
42 changes: 42 additions & 0 deletions tests/datamodules/test_caffe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

import os

import matplotlib.pyplot as plt
import pytest

from torchgeo.datamodules import CaFFeDataModule


class TestCaFFeDataModule:
@pytest.fixture
def datamodule(self) -> CaFFeDataModule:
root = os.path.join('tests', 'data', 'caffe')
batch_size = 2
num_workers = 0
dm = CaFFeDataModule(root=root, batch_size=batch_size, num_workers=num_workers)
return dm

def test_train_dataloader(self, datamodule: CaFFeDataModule) -> None:
datamodule.setup('fit')
next(iter(datamodule.train_dataloader()))

def test_val_dataloader(self, datamodule: CaFFeDataModule) -> None:
datamodule.setup('validate')
next(iter(datamodule.val_dataloader()))

def test_test_dataloader(self, datamodule: CaFFeDataModule) -> None:
datamodule.setup('test')
next(iter(datamodule.test_dataloader()))

def test_plot(self, datamodule: CaFFeDataModule) -> None:
datamodule.setup('validate')
batch = next(iter(datamodule.val_dataloader()))
sample = {
'image': batch['image'][0],
'mask_zones': batch['mask_zones'][0],
'mask_front': batch['mask_front'][0],
}
datamodule.plot(sample)
plt.close()
72 changes: 72 additions & 0 deletions tests/datasets/test_caffe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

import os
import shutil
from pathlib import Path

import matplotlib.pyplot as plt
import pytest
import torch
import torch.nn as nn
from _pytest.fixtures import SubRequest
from pytest import MonkeyPatch

from torchgeo.datasets import CaFFe, DatasetNotFoundError


class TestCaFFe:
@pytest.fixture(params=['train', 'val', 'test'])
def dataset(
self, monkeypatch: MonkeyPatch, tmp_path: Path, request: SubRequest
) -> CaFFe:
md5 = '73c0aba603c356b2cce9ebf952fb7be0'
monkeypatch.setattr(CaFFe, 'md5', md5)
url = os.path.join('tests', 'data', 'caffe', 'caffe.zip')
monkeypatch.setattr(CaFFe, 'url', url)
root = tmp_path
split = request.param
transforms = nn.Identity()
return CaFFe(root, split, transforms, download=True, checksum=True)

def test_getitem(self, dataset: CaFFe) -> None:
x = dataset[0]
assert isinstance(x, dict)
assert isinstance(x['image'], torch.Tensor)
assert x['image'].shape[0] == 1
assert isinstance(x['mask_zones'], torch.Tensor)
assert x['image'].shape[-2:] == x['mask_zones'].shape[-2:]

def test_len(self, dataset: CaFFe) -> None:
if dataset.split == 'train':
assert len(dataset) == 3
else:
assert len(dataset) == 3

def test_already_downloaded(self, dataset: CaFFe) -> None:
CaFFe(root=dataset.root)

def test_not_yet_extracted(self, tmp_path: Path) -> None:
filename = 'caffe.zip'
dir = os.path.join('tests', 'data', 'caffe')
shutil.copyfile(
os.path.join(dir, filename), os.path.join(str(tmp_path), filename)
)
CaFFe(root=str(tmp_path))

def test_invalid_split(self) -> None:
with pytest.raises(AssertionError):
CaFFe(split='foo')

def test_not_downloaded(self, tmp_path: Path) -> None:
with pytest.raises(DatasetNotFoundError, match='Dataset not found'):
CaFFe(tmp_path)

def test_plot(self, dataset: CaFFe) -> None:
dataset.plot(dataset[0], suptitle='Test')
plt.close()

sample = dataset[0]
sample['prediction'] = torch.clone(sample['mask_zones'])
dataset.plot(sample, suptitle='Prediction')
plt.close()
2 changes: 2 additions & 0 deletions torchgeo/datamodules/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from .agrifieldnet import AgriFieldNetDataModule
from .bigearthnet import BigEarthNetDataModule
from .cabuar import CaBuArDataModule
from .caffe import CaFFeDataModule
from .chabud import ChaBuDDataModule
from .chesapeake import ChesapeakeCVPRDataModule
from .cowc import COWCCountingDataModule
Expand Down Expand Up @@ -67,6 +68,7 @@
'SouthAfricaCropTypeDataModule',
# NonGeoDataset
'BigEarthNetDataModule',
'CaFFeDataModule',
'CaBuArDataModule',
'ChaBuDDataModule',
'COWCCountingDataModule',
Expand Down
55 changes: 55 additions & 0 deletions torchgeo/datamodules/caffe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

"""CaFFe datamodule."""

from typing import Any

import kornia.augmentation as K
import torch

from ..datasets import CaFFe
from ..transforms import AugmentationSequential
from .geo import NonGeoDataModule


class CaFFeDataModule(NonGeoDataModule):
"""LightningDataModule implementation for the CaFFe dataset.
Implements the default splits that come with the dataset.
.. versionadded:: 0.7
"""

mean = torch.Tensor([0.5517])
std = torch.Tensor([11.8478])

def __init__(
self, batch_size: int = 64, num_workers: int = 0, size: int = 512, **kwargs: Any
) -> None:
"""Initialize a new CaFFeDataModule instance.
Args:
batch_size: Size of each mini-batch.
num_workers: Number of workers for parallel data loading.
size: resize images of input size 512x512 to size x size
**kwargs: Additional keyword arguments passed to
:class:`~torchgeo.datasets.CaFFe`.
"""
super().__init__(CaFFe, batch_size, num_workers, **kwargs)

self.size = size

self.train_aug = AugmentationSequential(
K.Normalize(mean=self.mean, std=self.std),
K.Resize(size),
K.RandomHorizontalFlip(p=0.5),
K.RandomVerticalFlip(p=0.5),
data_keys=['image', 'mask'],
)

self.aug = AugmentationSequential(
K.Normalize(mean=self.mean, std=self.std),
K.Resize(size),
data_keys=['image', 'mask'],
)
2 changes: 2 additions & 0 deletions torchgeo/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from .bigearthnet import BigEarthNet
from .biomassters import BioMassters
from .cabuar import CaBuAr
from .caffe import CaFFe
from .cbf import CanadianBuildingFootprints
from .cdl import CDL
from .chabud import ChaBuD
Expand Down Expand Up @@ -205,6 +206,7 @@
'BigEarthNet',
'BioMassters',
'CaBuAr',
'CaFFe',
'ChaBuD',
'CloudCoverDetection',
'COWC',
Expand Down
Loading

0 comments on commit 67fe198

Please sign in to comment.