Skip to content

Commit

Permalink
Ruchi's Data Exploration (#44)
Browse files Browse the repository at this point in the history
* Adding Data Exploration for IMS

* initial commit with pixi

* Added Notebook for Applying LSTM to IMS Data

* liniting successfully reformatted the file

* Add Liniting for IMS_Data_Exploration file
  • Loading branch information
ruchiasthana authored Mar 10, 2024
1 parent 600ede9 commit c7e507d
Show file tree
Hide file tree
Showing 6 changed files with 132,641 additions and 3 deletions.
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.idea/
# pixi environments
.pixi

.DS_Store
2 changes: 1 addition & 1 deletion docs/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ To get started, clone the repo, checkout this branch, using Linux install pixi w
```curl -fsSL https://pixi.sh/install.sh | bash```

For Mac, install via homebrew with
```homebrew install pixi```
```brew install pixi```

If using windows, install pixi with:
```iwr -useb https://pixi.sh/install.ps1 | iex```
Expand Down
172 changes: 172 additions & 0 deletions workspaces/ruchi/IMS-Data-Exploration/IMS_Data_Exploration.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"\n",
"FileName = \"ims1998280_00UTC_24km_v1.1\"\n",
"\n",
"with open(FileName + \".asc\", \"r\") as f:\n",
" # Skip 1365 bytes within the header.\n",
" # Take all lines after the header, strip the end-line character and join all in one string.\n",
" f.seek(1365)\n",
" Lines = \" \".join(line.strip(\"\\n\") for line in f)\n",
"\n",
"# Remove all white space from Lines with Lines.split.\n",
"# If characters (s) in string are digits, convert to integers, then convert to a NumPy Array.\n",
"# Lastly, reshape to 1024 x 1024 24 km resolution.\n",
"DataArray = np.reshape(\n",
" np.asarray([int(s) for s in Lines.split() if s.isdigit()]), [1024, 1024]\n",
")\n",
"\n",
"# From the documentation (Table 3):\n",
"# For unpacked data, integer value of 164 is sea ice, while 165 is snow-covered land.\n",
"# Convert 164 to 3 (sea ice) and 165 to 4 (snow covered land) to align with packed data.\n",
"DataArray = np.where(DataArray == 164, 3, DataArray)\n",
"DataArray = np.where(DataArray == 165, 4, DataArray)\n",
"\n",
"# Create the output header to be saved to the new file.\n",
"HeaderText = \"NCOLS 1024\\nNROWS 1024\\nXLLCORNER -12126597.0\\nYLLCORNER -12126840.0\\nCELLSIZE 23684.997\"\n",
"np.savetxt(\n",
" FileName + \"_2packed.asc\",\n",
" DataArray,\n",
" header=HeaderText,\n",
" delimiter=\" \",\n",
" fmt=\"%s\",\n",
" comments=\"\",\n",
")"
],
"metadata": {
"id": "_eHgAMfU2vA-"
},
"execution_count": 12,
"outputs": []
},
{
"cell_type": "code",
"source": [
"type(DataArray), DataArray"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Viqu0MzBbt12",
"outputId": "7bed6ac3-ba4a-41c6-b4a0-d2ab3a336e7c"
},
"execution_count": 15,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(numpy.ndarray,\n",
" array([[0, 0, 0, ..., 0, 0, 0],\n",
" [0, 0, 0, ..., 0, 0, 0],\n",
" [0, 0, 0, ..., 0, 0, 0],\n",
" ...,\n",
" [0, 0, 0, ..., 0, 0, 0],\n",
" [0, 0, 0, ..., 0, 0, 0],\n",
" [0, 0, 0, ..., 0, 0, 0]]))"
]
},
"metadata": {},
"execution_count": 15
}
]
},
{
"cell_type": "markdown",
"source": [
"Initial Data Exploration. IMS Data is stored in .asc files, which contain: (1) header with information about the data collected, and (2) an array of float, which represent different qualities of data detailed:\n",
"\n",
"- 0 (outside Northern Hemisphere).\n",
"- 1 (open water)\n",
"- 2 (land without snow)\n",
"- 3 (sea or lake ice)\n",
"- 4 (snow covered land)"
],
"metadata": {
"id": "7B0uvO3_bzEC"
}
},
{
"cell_type": "code",
"source": [
"DataArray.shape"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "KTQArACr3FtU",
"outputId": "bfe05387-d3c4-4171-b871-7f3fbbe451b0"
},
"execution_count": 16,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(1024, 1024)"
]
},
"metadata": {},
"execution_count": 16
}
]
},
{
"cell_type": "markdown",
"source": [
"The DataArray containes significantly more 0 values than other numerical values. In order to determine if there are instances of other numerrical values I ran the following lines of code below:"
],
"metadata": {
"id": "AM_aLxwXctca"
}
},
{
"cell_type": "code",
"source": [
"# Below are instances of sea and land\n",
"unique, counts = np.unique(DataArray, return_counts=True)\n",
"dict(zip(unique, counts))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "rZgNFRPY6EtI",
"outputId": "fa5eaba5-22c9-4c77-fffb-b53fcfce17f8"
},
"execution_count": 17,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{0: 1015028, 3: 10163, 4: 23385}"
]
},
"metadata": {},
"execution_count": 17
}
]
}
]
}
Loading

0 comments on commit c7e507d

Please sign in to comment.