StefanTodoran · StefanTodoran · Oct 27, 2023 · Jul 19, 2023 · Oct 24, 2023 · Oct 25, 2023
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,6 @@
 # Ignore the massive dataset
 book/data/AirborneData.mat
-book/out
+book/out/
 book/data/img
 book/data/model
 .DS_Store

diff --git a/book/chapters/masking.ipynb b/book/chapters/masking.ipynb
diff --git a/book/chapters/minify_skip.ipynb b/book/chapters/minify_skip.ipynb
@@ -20,7 +20,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": 64,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -38,18 +38,19 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 35,
+   "execution_count": 65,
    "metadata": {},
    "outputs": [],
    "source": [
     "airborne_data_path = \"../data/AirborneData.mat\"\n",
     "assert os.path.exists(airborne_data_path)\n",
-    "airborne_data = scipy.io.loadmat(airborne_data_path)"
+    "airborne_data = scipy.io.loadmat(airborne_data_path)\n",
+    "original_size = os.path.getsize(airborne_data_path) "
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 36,
+   "execution_count": 66,
    "metadata": {},
    "outputs": [
     {
@@ -66,7 +67,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 37,
+   "execution_count": 67,
    "metadata": {},
    "outputs": [
     {
@@ -90,12 +91,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "It is unclear if any of these are important, but they don't contibute much to the file size and don't get in our way so no reason to bother ourselves with removing them."
+    "It is unclear if any of these will prove important to our data processing, but they don't contibute almost anything to the file size and don't get in our way so no reason to bother ourselves with removing them."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 38,
+   "execution_count": 68,
    "metadata": {},
    "outputs": [
     {
@@ -127,7 +128,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 39,
+   "execution_count": 69,
    "metadata": {},
    "outputs": [
     {
@@ -143,10 +144,6 @@
       "<class 'numpy.ndarray'>\n",
       "(406, 5)\n",
       "\n",
-      "altitude:\n",
-      "<class 'numpy.ndarray'>\n",
-      "(406, 1)\n",
-      "\n",
       "datePDT:\n",
       "<class 'numpy.ndarray'>\n",
       "(406,)\n"
@@ -162,10 +159,6 @@
     "print(type(airborne_data['tempRiver']))\n",
     "print(airborne_data['tempRiver'].shape)\n",
     "\n",
-    "print('\\naltitude:')\n",
-    "print(type(airborne_data['altitude']))\n",
-    "print(airborne_data['altitude'].shape)\n",
-    "\n",
     "print('\\ndatePDT:')\n",
     "print(type(airborne_data['datePDT']))\n",
     "print(airborne_data['datePDT'].shape)"
@@ -176,12 +169,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As far as I know, `maskRiver` and `tempRiver` are abandoned old work from a few years ago when another researched tried to do some processing on this dataset. Trimming them will greatly reduce the dataset size. The `altitude` data may come in handy later, but at the moment it isn't useful to us."
+    "It seems `maskRiver` and `tempRiver` are abandoned old work from a few years ago when another researched tried to do some processing on this dataset. Trimming them will greatly reduce the dataset size."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 40,
+   "execution_count": 70,
    "metadata": {},
    "outputs": [
     {
@@ -190,15 +183,14 @@
        "''"
       ]
      },
-     "execution_count": 40,
+     "execution_count": 70,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "airborne_data.pop('maskRiver')\n",
     "airborne_data.pop('tempRiver')\n",
-    "airborne_data.pop('altitude')\n",
     "airborne_data.pop('datePDT')\n",
     ";"
    ]
@@ -213,7 +205,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 41,
+   "execution_count": 71,
    "metadata": {},
    "outputs": [
     {
@@ -267,7 +259,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 42,
+   "execution_count": 72,
    "metadata": {},
    "outputs": [
     {
@@ -276,14 +268,14 @@
        "''"
       ]
      },
-     "execution_count": 42,
+     "execution_count": 72,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "airborne_data.pop('northings')\n",
-    "airborne_data.pop('eastings')\n",
+    "# airborne_data.pop('northings')\n",
+    "# airborne_data.pop('eastings')\n",
     "airborne_data.pop('Xt')\n",
     "airborne_data.pop('Yt')\n",
     "airborne_data.pop('Zt')\n",
@@ -292,14 +284,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 43,
+   "execution_count": 73,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "['__header__', '__version__', '__globals__', 'imageRGB', 'imageIR']\n"
+      "['__header__', '__version__', '__globals__', 'imageRGB', 'imageIR', 'northings', 'eastings', 'altitude']\n"
      ]
     }
    ],
@@ -314,25 +306,27 @@
   "source": [
    "We've significantly reduced the file size with these steps. However, we still have 812 images, which at about 1 MB a piece leaves us with a still gargantuan ~800 MB file, far too large for Github. We are going to need to trim this down a bit.\n",
    "\n",
    "Once we have chosen what size subset of the data to use, in this case 25 images, we have to decide which images. For this dataset, since the sequence of images matter (we want images next to eachother since we are dealing with misalignment), we will just choose the first 25 images. For other datasets, this may not be the optimal choice."
   ]
   },
   {
    "cell_type": "code",
-   "execution_count": 48,
+   "execution_count": 74,
    "metadata": {},
    "outputs": [],
    "source": [
-    "trimmed_rgb = airborne_data['imageRGB'][:,:,0:25]\n",
-    "trimmed_ir = airborne_data['imageIR'][:,:,0:25]\n",
+    "subset_size = 25\n",
+    "\n",
+    "trimmed_rgb = airborne_data['imageRGB'][:,:,0:subset_size]\n",
+    "trimmed_ir = airborne_data['imageIR'][:,:,0:subset_size]\n",
     "\n",
     "airborne_data['imageRGB'] = trimmed_rgb\n",
     "airborne_data['imageIR'] = trimmed_ir"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 49,
+   "execution_count": 75,
    "metadata": {},
    "outputs": [
     {
@@ -359,17 +353,35 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After verifying we have succesfully extracted our subset of images, we can tell scipy to save our file."
   ]
   },
   {
    "cell_type": "code",
-   "execution_count": 50,
+   "execution_count": 76,
    "metadata": {},
    "outputs": [],
    "source": [
-    "airborne_data_path = \"../data/AirborneDataMini.mat\"\n",
-    "scipy.io.savemat(airborne_data_path, airborne_data)"
+    "airborne_minidata_path = \"../data/AirborneDataMini.mat\"\n",
+    "scipy.io.savemat(airborne_minidata_path, airborne_data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 81,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Size reduced by: 1382450025 bytes\n"
+     ]
+    }
+   ],
+   "source": [
+    "minified_size = os.path.getsize(airborne_minidata_path)\n",
+    "print(\"Size reduced by:\", original_size - minified_size, \"bytes\")"
    ]
   }
  ],
@@ -389,7 +401,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.13"
+   "version": "3.10.12"
   },
   "orig_nbformat": 4,
   "vscode": {

diff --git a/book/chapters/realignment.ipynb b/book/chapters/realignment.ipynb