Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoSMART Hackweek Progress #1

Merged
merged 13 commits into from
Oct 27, 2023
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Ignore the massive dataset
book/data/AirborneData.mat
book/out
book/out/
book/data/img
book/data/model
.DS_Store
Expand Down
859 changes: 740 additions & 119 deletions book/chapters/masking.ipynb

Large diffs are not rendered by default.

82 changes: 47 additions & 35 deletions book/chapters/minify_skip.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
},
{
"cell_type": "code",
"execution_count": 34,
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -38,18 +38,19 @@
},
{
"cell_type": "code",
"execution_count": 35,
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"airborne_data_path = \"../data/AirborneData.mat\"\n",
"assert os.path.exists(airborne_data_path)\n",
"airborne_data = scipy.io.loadmat(airborne_data_path)"
"airborne_data = scipy.io.loadmat(airborne_data_path)\n",
"original_size = os.path.getsize(airborne_data_path) "
]
},
{
"cell_type": "code",
"execution_count": 36,
"execution_count": 66,
"metadata": {},
"outputs": [
{
Expand All @@ -66,7 +67,7 @@
},
{
"cell_type": "code",
"execution_count": 37,
"execution_count": 67,
"metadata": {},
"outputs": [
{
Expand All @@ -90,12 +91,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"It is unclear if any of these are important, but they don't contibute much to the file size and don't get in our way so no reason to bother ourselves with removing them."
"It is unclear if any of these will prove important to our data processing, but they don't contibute almost anything to the file size and don't get in our way so no reason to bother ourselves with removing them."

Check failure on line 94 in book/chapters/minify_skip.ipynb

View workflow job for this annotation

GitHub Actions / quality-control

contibute ==> contribute
]
},
{
"cell_type": "code",
"execution_count": 38,
"execution_count": 68,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -127,7 +128,7 @@
},
{
"cell_type": "code",
"execution_count": 39,
"execution_count": 69,
"metadata": {},
"outputs": [
{
Expand All @@ -143,10 +144,6 @@
"<class 'numpy.ndarray'>\n",
"(406, 5)\n",
"\n",
"altitude:\n",
"<class 'numpy.ndarray'>\n",
"(406, 1)\n",
"\n",
"datePDT:\n",
"<class 'numpy.ndarray'>\n",
"(406,)\n"
Expand All @@ -162,10 +159,6 @@
"print(type(airborne_data['tempRiver']))\n",
"print(airborne_data['tempRiver'].shape)\n",
"\n",
"print('\\naltitude:')\n",
"print(type(airborne_data['altitude']))\n",
"print(airborne_data['altitude'].shape)\n",
"\n",
"print('\\ndatePDT:')\n",
"print(type(airborne_data['datePDT']))\n",
"print(airborne_data['datePDT'].shape)"
Expand All @@ -176,12 +169,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"As far as I know, `maskRiver` and `tempRiver` are abandoned old work from a few years ago when another researched tried to do some processing on this dataset. Trimming them will greatly reduce the dataset size. The `altitude` data may come in handy later, but at the moment it isn't useful to us."
"It seems `maskRiver` and `tempRiver` are abandoned old work from a few years ago when another researched tried to do some processing on this dataset. Trimming them will greatly reduce the dataset size."
]
},
{
"cell_type": "code",
"execution_count": 40,
"execution_count": 70,
"metadata": {},
"outputs": [
{
Expand All @@ -190,15 +183,14 @@
"''"
]
},
"execution_count": 40,
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"airborne_data.pop('maskRiver')\n",
"airborne_data.pop('tempRiver')\n",
"airborne_data.pop('altitude')\n",
"airborne_data.pop('datePDT')\n",
";"
]
Expand All @@ -213,7 +205,7 @@
},
{
"cell_type": "code",
"execution_count": 41,
"execution_count": 71,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -267,7 +259,7 @@
},
{
"cell_type": "code",
"execution_count": 42,
"execution_count": 72,
"metadata": {},
"outputs": [
{
Expand All @@ -276,14 +268,14 @@
"''"
]
},
"execution_count": 42,
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"airborne_data.pop('northings')\n",
"airborne_data.pop('eastings')\n",
"# airborne_data.pop('northings')\n",
"# airborne_data.pop('eastings')\n",
"airborne_data.pop('Xt')\n",
"airborne_data.pop('Yt')\n",
"airborne_data.pop('Zt')\n",
Expand All @@ -292,14 +284,14 @@
},
{
"cell_type": "code",
"execution_count": 43,
"execution_count": 73,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['__header__', '__version__', '__globals__', 'imageRGB', 'imageIR']\n"
"['__header__', '__version__', '__globals__', 'imageRGB', 'imageIR', 'northings', 'eastings', 'altitude']\n"
]
}
],
Expand All @@ -314,25 +306,27 @@
"source": [
"We've significantly reduced the file size with these steps. However, we still have 812 images, which at about 1 MB a piece leaves us with a still gargantuan ~800 MB file, far too large for Github. We are going to need to trim this down a bit.\n",
"\n",
"Once we have chosen what size subset of the data to use, in this case 25 images, we have to decide which images. For this dataset, since the sequence of images matter (we want images next to eachother since we are dealing with misalignment), we will just choose the first 25 images. For other datasets, this may not be the optimal choice."

Check failure on line 309 in book/chapters/minify_skip.ipynb

View workflow job for this annotation

GitHub Actions / quality-control

eachother ==> each other
]
},
{
"cell_type": "code",
"execution_count": 48,
"execution_count": 74,
"metadata": {},
"outputs": [],
"source": [
"trimmed_rgb = airborne_data['imageRGB'][:,:,0:25]\n",
"trimmed_ir = airborne_data['imageIR'][:,:,0:25]\n",
"subset_size = 25\n",
"\n",
"trimmed_rgb = airborne_data['imageRGB'][:,:,0:subset_size]\n",
"trimmed_ir = airborne_data['imageIR'][:,:,0:subset_size]\n",
"\n",
"airborne_data['imageRGB'] = trimmed_rgb\n",
"airborne_data['imageIR'] = trimmed_ir"
]
},
{
"cell_type": "code",
"execution_count": 49,
"execution_count": 75,
"metadata": {},
"outputs": [
{
Expand All @@ -359,17 +353,35 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"After verifying we have succesfully extracted our subset of images, we can tell scipy to save our file."

Check failure on line 356 in book/chapters/minify_skip.ipynb

View workflow job for this annotation

GitHub Actions / quality-control

succesfully ==> successfully
]
},
{
"cell_type": "code",
"execution_count": 50,
"execution_count": 76,
"metadata": {},
"outputs": [],
"source": [
"airborne_data_path = \"../data/AirborneDataMini.mat\"\n",
"scipy.io.savemat(airborne_data_path, airborne_data)"
"airborne_minidata_path = \"../data/AirborneDataMini.mat\"\n",
"scipy.io.savemat(airborne_minidata_path, airborne_data)"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Size reduced by: 1382450025 bytes\n"
]
}
],
"source": [
"minified_size = os.path.getsize(airborne_minidata_path)\n",
"print(\"Size reduced by:\", original_size - minified_size, \"bytes\")"
]
}
],
Expand All @@ -389,7 +401,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
"version": "3.10.12"
},
"orig_nbformat": 4,
"vscode": {
Expand Down
119 changes: 61 additions & 58 deletions book/chapters/realignment.ipynb

Large diffs are not rendered by default.

Loading
Loading