Skip to content

Commit

Permalink
update student version with curriculum book changes
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Oct 2, 2024
1 parent 6e903ef commit 56b1d45
Show file tree
Hide file tree
Showing 5 changed files with 175,065 additions and 174,954 deletions.
43 changes: 38 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,43 @@
# GeoSMART Curriculum Jupyter Book (ESS 469/569)

[![Deploy](https://github.com/geo-smart/mlgeo-book/actions/workflows/deploy.yaml/badge.svg)](https://github.com/geo-smart/mlgeo-book/actions/workflows/deploy.yaml)
[![Jupyter Book Badge](https://jupyterbook.org/badge.svg)](https://geo-smart.github.io/mlgeo-book)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/geo-smart/mlgeo-book/HEAD?urlpath=lab)
[![Deploy](https://github.com/geo-smart/mlgeo-instructor/actions/workflows/deploy.yaml/badge.svg)](https://github.com/geo-smart/mlgeo-instructor/actions/workflows/deploy.yaml)
[![Jupyter Book Badge](https://jupyterbook.org/badge.svg)](https://geo-smart.github.io/mlgeo-instructor)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/geo-smart/mlgeo-instructor/HEAD?urlpath=lab)
[![GeoSMART Library Badge](book/img/curricula_badge.svg)](https://geo-smart.github.io/curriculum)
[![Student Version](book/img/student_version_badge.svg)](https://geo-smart.github.io/mlgeo-book/)

## About
## Repository Overview

This repository stores configuration for GeoSMART curriculum content, specifically the student version of the book. This version of the book should never be directly edited, as the student version is automatically generated on push.
This repository stores configuration for GeoSMART curriculum content, specifically the teacher version of the book. Only this version of the book should ever be edited, as the student version is automatically generated on push by github actions.

## Making Changes

Edit the book content by modifying the `_config.yml`, `_toc.yml` and `*.ipynb` files in the `book` directory. The book is hosted on Github Pages and will be automatically updated on push, and the student book will also be created automatically on push.

Making changes requires that you set up a conda environment and build locally before making sure that it will build with github actions. We accepted rendered notebooks, but some oddities, such as kernels different than python, will make it crash. So we recommend that contributors first build the book with the added notebooks.

```sh
conda env create -f ./conda/environment.yml
conda activate curriculum_book

```

To modify the exact differences between this book and the student book, edit `.github/workflows/clean_book.py`. When you push, a github action will clone the repo and run this python file which modifies certain parts of `*.ipynb` file contents, then pushes to the student repo. To edit the student repo's README, edit `STUDENT_README.md`. The Github Actions workflow also automatically replaces `README.md` with `STUDENT_README.md` in the student repo.

### `Student Response Sections`

One modifications made by the `clean_book.py` workflow is to clear sections marked for student response. Code cells marked for student response may contain code in the teacher version of the book, but will have their code removed and replaced with a TODO comment in the student version.

To mark a code cell to be cleared, insert a markdown cell directly preceding it with the following content:

````markdown
```{admonition} Student response section
This section is left for the student to complete.
```
````

## Serving Locally

Activate the `curriculum_book` conda environment (or any conda environment that has the necessary jupyter book dependencies). Navigate to the root folder of the curriculum book repository in anaconda prompt, then run `python server.py`.

On startup, the server will run `jb build book` to build all changes to the notebook and create the compiled HTML. The server code can take a `--no-build` flag (or `--nb` shorthand) if you don't want to build any changes you've made to the notebooks. In the case that you don't want to build changes made to the notebooks, you can just run `python serer.py --nb` from any terminal with python installed.
10 changes: 10 additions & 0 deletions STUDENT_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# GeoSMART Curriculum Jupyter Book (ESS 469/569)

[![Deploy](https://github.com/geo-smart/mlgeo-book/actions/workflows/deploy.yaml/badge.svg)](https://github.com/geo-smart/mlgeo-book/actions/workflows/deploy.yaml)
[![Jupyter Book Badge](https://jupyterbook.org/badge.svg)](https://geo-smart.github.io/mlgeo-book)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/geo-smart/mlgeo-book/HEAD?urlpath=lab)
[![GeoSMART Library Badge](book/img/curricula_badge.svg)](https://geo-smart.github.io/curriculum)

## About

This repository stores configuration for GeoSMART curriculum content, specifically the student version of the book. This version of the book should never be directly edited, as the student version is automatically generated on push.
124 changes: 93 additions & 31 deletions book/Chapter2-DataManipulation/2.2_data_formats_rendered.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,38 +10,105 @@
"In this tutorial, we will manipulate the data structure from and to several data formats.\n",
"\n",
"\n",
"The formats that support unstructured (non relational) data are:\n",
"- JSON: JavaScript Object Notation, an open standard file format that uses human-readable text. The data may be attribute-value pairs and arrays. It is language-independent. The syntax looks like this\n",
"```\n",
"## JSON (JavaScript Object Notation)\n",
"JSON is a lightweight, human-readable data format used in web applications and APIs for data exchange. SON is used to store metadata, configuration files, and small datasets, particularly when working with web-based applications or interacting with APIs (e.g., querying weather data or geospatial information from an API).\n",
"Here is a simple example:\n",
"```json\n",
"{\n",
" \"firstName\": \"John\",\n",
" \"lastName\": \"Smith\",\n",
" \"isAlive\": true,\n",
" \"age\": 27,\n",
" \"address\": {\n",
" \"streetAddress\": \"21 2nd Street\",\n",
" \"city\": \"New York\",\n",
" \"state\": \"NY\",\n",
" \"postalCode\": \"10021-3100\"\n",
" }\n",
" \"location\": \"Yellowstone\",\n",
" \"coordinates\": {\n",
" \"latitude\": 44.423691,\n",
" \"longitude\": -110.588516\n",
" },\n",
" \"elevation_m\": 2399,\n",
" \"temperature_c\": 22.5\n",
"}\n",
"\n",
"```\n",
"The character encoding is UTF-8. The data types in JSON files may be numbers, string, boolean, array, object (collection of name-value pairs), or null. More information on JSON from the [EarthDataScience course](!https://www.earthdatascience.org/courses/use-data-open-source-python/intro-to-apis/apis-in-python/).\n",
"\n",
"In Python, you create simply a JSON file with the JSON library:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"The main formats that support pixelized raster data are:\n",
"- **GeoTIFF**: metadata standard that allows for georeferencing information embedded in a TIFF (Tagged Image File Format) file. **GeoTIFF** is enhanced to be cloud optimized.\n",
"- **GeoJSON**: GeoJSON is a format for encoding a variety of geographic data structures in the JSON format.\n",
"data = {\n",
" \"location\": \"Yellowstone\",\n",
" \"coordinates\": {\"latitude\": 44.423691, \"longitude\": -110.588516},\n",
" \"elevation_m\": 2399,\n",
" \"temperature_c\": 22.5\n",
"}\n",
"\n",
"with open('data.json', 'w') as outfile:\n",
" json.dump(data, outfile)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tabular Data Formats \n",
"\n",
"Tabular data are common in geosciences. The *CSV* format is convienent, human-readable files for *small data sets* that store data in rows. The *Parquet* format is machine-readable for *large data sets* that supports compression.\n",
"\n",
"### **CSV (Comma-Separated Values)**:\n",
"\n",
"CSV is a simple, widely-used format for tabular data, often used in geoscience for sharing and storing smaller datasets (e.g., soil samples, environmental readings).\n",
"It stores data as plain text, making it easy to read but can be inefficient for large datasets. It is most often read using ``pandas``."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"The formats that support tabular data are:\n",
"- CSV\n",
"- Parquet\n",
"# Reading a CSV file\n",
"df = pd.read_csv('data.csv')\n",
"\n",
"# Writing a CSV file\n",
"df.to_csv('output.csv', index=False)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Parquet**:\n",
"Parquet is a binary, columnar storage format optimized for efficiency, particularly for large datasets. It is widely used in big data environments (e.g., storing satellite imagery or climate model outputs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Writing the dictionary data to a Parquet file\n",
"df = pd.DataFrame([data])\n",
"df.to_parquet('data.parquet', index=False)\n",
"\n",
"The data formats for big heterogeneous data (different data types):\n",
"- NetCDF4\n",
"- HDF5\n",
"- Zarr"
"# Reading the Parquet file\n",
"df_read = pd.read_parquet('data.parquet')\n",
"print(df_read)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Geospatial Data\n",
"The main formats that support pixelized raster data are:\n",
"- **GeoTIFF**: metadata standard that allows for georeferencing information embedded in a TIFF (Tagged Image File Format) file. **GeoTIFF** is enhanced to be cloud optimized.\n",
"- **GeoJSON**: GeoJSON is a format for encoding a variety of geographic data structures in the JSON format.\n"
]
},
{
Expand Down Expand Up @@ -583,7 +650,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10.5 64-bit",
"display_name": "mlgeo",
"language": "python",
"name": "python3"
},
Expand All @@ -597,14 +664,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
"version": "3.9.18"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "af2b86a0d97d2bdb49befe19981ba48b79a904c391b62d75845b127da778abba"
}
}
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
Expand Down
Loading

0 comments on commit 56b1d45

Please sign in to comment.