Merge pull request #2 from geo-smart/main

Sync old repo
StefanTodoran · May 8, 2024 · 8e879d5 · 8e879d5
2 parents d07dd03 + 29e9a7f
commit 8e879d5
Show file tree

Hide file tree

Showing 46 changed files with 10,810 additions and 1,198 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,10 +1,15 @@
 # Ignore the massive dataset
-book/data/AirborneData.mat
+book/data/Elwha2012.mat
+book/data/Elwha2014.mat
+book/data/Elwha2012Mini.mat
 book/out/
 book/data/img
 book/data/model
+book/learning
 .DS_Store
 .vscode
+**/*.hf
+**/*.pth
 
 # Jupyter Book things
 .bash_history

diff --git a/README.md b/README.md
@@ -1,32 +1,53 @@
-# Elwha Dataset Realignment
+# Elwha Segmentation
 
 [![Deploy](https://github.com/StefanTodoran/elwha_dataset_realignment/actions/workflows/deploy.yaml/badge.svg)](https://github.com/StefanTodoran/elwha_dataset_realignment/actions/workflows/deploy.yaml)
 [![Jupyter Book Badge](https://jupyterbook.org/badge.svg)](https://todoran.dev/elwha_dataset_realignment/)
 [![GeoSMART Use Case](./book/img/use_case_badge.svg)](https://geo-smart.github.io/usecases)
 [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/StefanTodoran/elwha_dataset_realignment/HEAD)
 [![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/StefanTodoran/elwha_dataset_realignment)
 
-This repository stores a computer vision workflow for aligning offset image datasets which desccribe the same realworld objects but are in difference colorspaces.
+This repository stores a computer vision workflow focused on image processing of remote sensing imagery from the Elwha river. The first portion relates to realigning offset image datasets, and the second portion involves fine-tuning Meta's SAM for bird's eye view river pixel segmentation.
 
-## Introduction
+<img src="book/img/thumbnail.png">
+
+## Dataset Realignment
 
 This repository contains a python workflow for reconstruction of misaligned image datasets which span multiple colorspaces, and the application of these techniques on a specific dataset.
 
 The dataset in question is made up of 812 RGB and IR aerial photographs taken from a plane flown over the Elwha river in 2012. The purpose of this project is to prepare the dataset for more advanced computer vision processing like cold water refuge mapping.
 
-## Problem Statement
+### Problem Statement
 
-In order for more advanced processing such as classification tasks to take place, there is a need to know for any given pixel in any given image both the RGB and IR data at that point. Unfortunately, the IR images are not only misaligned with the RGB images, they are also at a different scale and were shot with different camera settings/properties. 
+In order for more advanced processing tasks such as segmentation and classification to take place, there is a need to know for any given pixel in any given image both the RGB and IR data at that point. Unfortunately, the IR images are not only misaligned with the RGB images, they are also at a different scale and were shot with different camera settings/properties. 
 
 See figure 1.A on alignment below. In order to (roughly) match the IR image to the RGB image, the IR image had to be shrunken despite the fact that the RBG and IR images seemingly have the same size of `640x480`. There is also still some distortion in the edges of the image.
 
-<img src="book/img/alignment2.gif" width="360"/>
+Figure 1 | Figure 2
+--- | --- 
+<img src="book/img/alignment2.gif"/> | <img src="book/img/alignment.png"/>
 
 To add even more complexity, images within the RBG and IR image sets are not all distinct, but rather overlap to a large degree, and the plane's flight trajectory means that from one image to the next we see rotation, translation, and scale variance at the same time. See figure 1.B below comparing `airborne_1.png` and `airborne_2.png`. Zone `A` is perfectly matched, which means zone `B` is imperfectly matched and zone `C` is completely misaligned. No matter where one attempts to match the images, without any projection it is impossible two images. Therefore, some sort of affine transformations will be necessary.
 
-<img src="book/img/alignment.png" width="360"/>
 
-## Serving
+## River Segmentation
+
+This repository contains a python workflow for fine-tuning public checkpoints of Meta's Segment Anything Model (SAM) for the purpose of automatically segmenting river pixels. There is a focus on experimental rigour, justifying all pipeline decisions and hyperparameters with experimental comparison.
+
+### Dataset
+
+The dataset is adapted from Daniel Buscombe's 2023 [publication](https://zenodo.org/records/10155783), although only the RGB images are maintained. Only about 200 of the original 4000+ images are used, and all of the ground truth masks have been replaced. This smaller version of the dataset with our improved segmentation masks can be found on HuggingFace at [stodoran/elwha-segmentation](https://huggingface.co/datasets/stodoran/elwha-segmentation). Below is a comparison of the original GT masks from Buscombe's dataset compared to our replacements.
+
+<img src="book/img/mask_fix_1.png" width="360">
+<img src="book/img/mask_fix_2.png" width="360">
+<img src="book/img/mask_fix_3.png" width="360">
+
+The improved mask quality results in better fine-tuning performance, despite the vastly smaller dataset size. As visible from the samples, for some masks the correction is fairly minor. However there are some masks in the original dataset which accurately classify less than 10% of water pixels.
+
+## Development Instructions
+
+While image processing notebooks should likely be run locally, the fine-tuning notebook is set up so that it can be run in Google Colab. Since the notebook uses HuggingFace Accelerate, if launched on a VM with multiple GPUs the training can be distributed across all of them.
+
+### Serving Locally
 
 Activate the `elwha_env` conda environment. Navigate to the root folder of the repository in anaconda prompt. Run `python server.py`.
 

diff --git a/book/about.md b/book/about.md
@@ -10,5 +10,7 @@ This book is a contribution to the GeoSMART use case library, a collection of bo
 
 The dataset of images from the elwha river basin is quite large, so it is not included in the repository. If you'd like to try the workflow yourself, you will need to download it from [here](https://www.dropbox.com/s/qkr9712m8jt3zft/AirborneData.mat?dl=0).
 
+<!-- TODO, fix the dropbox link -->
+
 [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/StefanTodoran/elwha_dataset_realignment/HEAD)
 [![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/StefanTodoran/elwha_dataset_realignment)
diff --git a/book/chapters/huggingface.ipynb b/book/chapters/huggingface.ipynb
@@ -0,0 +1,253 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from datasets import Dataset, DatasetDict, Image\n",
+    "from util import TLDataset, readSetFromFile"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For the full dataset:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# train_data = TLDataset(\n",
+    "#     \"../learning/\", \"imagery/\", \"masks/\", \"*_corrected.png\", \n",
+    "#     subset=\"Train\", fraction=0.1, seed=1,\n",
+    "# )\n",
+    "\n",
+    "# validation_data = TLDataset(\n",
+    "#     \"../learning/\", \"imagery/\", \"masks/\", \"*_corrected.png\", \n",
+    "#     subset=\"Test\", fraction=0.1, seed=1,\n",
+    "# )\n",
+    "\n",
+    "# image_paths_train = [str(path) for path in train_data.image_names]\n",
+    "# label_paths_train = [str(path) for path in train_data.mask_names]\n",
+    "\n",
+    "# image_paths_validation = [str(path) for path in validation_data.image_names]\n",
+    "# label_paths_validation = [str(path) for path in validation_data.mask_names]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For the tiny dataset:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Found and loaded 4382 images with glob *_corrected.png.\n",
+      "Subset of 4382 ground truth segmentation masks marked for Train.\n",
+      "Tiny dataset train: 198 images\n",
+      "Tiny dataset validation: 22 images\n"
+     ]
+    }
+   ],
+   "source": [
+    "all_data = TLDataset(\n",
+    "    \"../learning/\", \"imagery/\", \"masks/\", \"*_corrected.png\", \n",
+    "    subset=\"Train\", fraction=0, seed=1, # We want to select every image\n",
+    ")\n",
+    "\n",
+    "indices = readSetFromFile(\"../data/useful_images.txt\")\n",
+    "names = [all_data[index][\"name\"].replace(\".png\", \"\") for index in indices]\n",
+    "\n",
+    "cutoff = int(len(names) / 10) # fraction=0.1\n",
+    "train_names = names[cutoff:]\n",
+    "val_names = names[:cutoff]\n",
+    "\n",
+    "image_paths_train = [f\"../learning/imagery/{name}.png\" for name in train_names]\n",
+    "label_paths_train = [f\"../learning/masks/{name}_corrected.png\" for name in train_names]\n",
+    "\n",
+    "image_paths_validation = [f\"../learning/imagery/{name}.png\" for name in val_names]\n",
+    "label_paths_validation = [f\"../learning/masks/{name}_corrected.png\" for name in val_names]\n",
+    "\n",
+    "print(f\"Tiny dataset train: {len(train_names)} images\")\n",
+    "print(f\"Tiny dataset validation: {len(val_names)} images\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "dd26e5f40ee648f19e7c720a0845ffce",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "3e6aef8bc7774ba8ab294a718a0afa56",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Map:   0%|          | 0/198 [00:00<?, ? examples/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "f76052cf75eb4e9cbec04f13555c99c1",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Creating parquet from Arrow format:   0%|          | 0/2 [00:00<?, ?ba/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "30c3fae17e094ca491b9de7fb022e611",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "f6993e1878e0488589e5a232923f06e3",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Map:   0%|          | 0/22 [00:00<?, ? examples/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "f5ff8c98b7db43059f68c567f0250ddb",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "5a3b9be6c6d641e9a2182c84cc7447a9",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "README.md:   0%|          | 0.00/434 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "CommitInfo(commit_url='https://huggingface.co/datasets/stodoran/elwha-segmentation-tiny/commit/b03c4bd87b3924a54828adfd01b0cf689a9aef07', commit_message='Upload dataset', commit_description='', oid='b03c4bd87b3924a54828adfd01b0cf689a9aef07', pr_url=None, pr_revision=None, pr_num=None)"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "def createDataset(image_paths, label_paths):\n",
+    "    dataset = Dataset.from_dict({\"image\": sorted(image_paths), \"label\": sorted(label_paths)})\n",
+    "    dataset = dataset.cast_column(\"image\", Image())\n",
+    "    dataset = dataset.cast_column(\"label\", Image())\n",
+    "\n",
+    "    return dataset\n",
+    "\n",
+    "train_dataset = createDataset(image_paths_train, label_paths_train)\n",
+    "validation_dataset = createDataset(image_paths_validation, label_paths_validation)\n",
+    "\n",
+    "dataset = DatasetDict({\n",
+    "    \"train\": train_dataset,\n",
+    "    \"validation\": validation_dataset,\n",
+    "})\n",
+    "\n",
+    "# This function assumes you have ran the huggingface-cli login command in a terminal/notebook\n",
+    "dataset.push_to_hub(\"stodoran/elwha-segmentation\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "[stodoran/elwha-segmentation-large](https://huggingface.co/datasets/stodoran/elwha-segmentation-large/tree/main)<br>\n",
+    "[stodoran/elwha-segmentation-tiny](https://huggingface.co/datasets/stodoran/elwha-segmentation-tiny/tree/main)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "elwha_env",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}