-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
update student version with curriculum book changes
- Loading branch information
1 parent
e33ceb5
commit 5cb4c7e
Showing
10 changed files
with
175,483 additions
and
174,940 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,43 @@ | ||
# GeoSMART Curriculum Jupyter Book (ESS 469/569) | ||
|
||
[![Deploy](https://github.com/geo-smart/mlgeo-book/actions/workflows/deploy.yaml/badge.svg)](https://github.com/geo-smart/mlgeo-book/actions/workflows/deploy.yaml) | ||
[![Jupyter Book Badge](https://jupyterbook.org/badge.svg)](https://geo-smart.github.io/mlgeo-book) | ||
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/geo-smart/mlgeo-book/HEAD?urlpath=lab) | ||
[![Deploy](https://github.com/geo-smart/mlgeo-instructor/actions/workflows/deploy.yaml/badge.svg)](https://github.com/geo-smart/mlgeo-instructor/actions/workflows/deploy.yaml) | ||
[![Jupyter Book Badge](https://jupyterbook.org/badge.svg)](https://geo-smart.github.io/mlgeo-instructor) | ||
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/geo-smart/mlgeo-instructor/HEAD?urlpath=lab) | ||
[![GeoSMART Library Badge](book/img/curricula_badge.svg)](https://geo-smart.github.io/curriculum) | ||
[![Student Version](book/img/student_version_badge.svg)](https://geo-smart.github.io/mlgeo-book/) | ||
|
||
## About | ||
## Repository Overview | ||
|
||
This repository stores configuration for GeoSMART curriculum content, specifically the student version of the book. This version of the book should never be directly edited, as the student version is automatically generated on push. | ||
This repository stores configuration for GeoSMART curriculum content, specifically the teacher version of the book. Only this version of the book should ever be edited, as the student version is automatically generated on push by github actions. | ||
|
||
## Making Changes | ||
|
||
Edit the book content by modifying the `_config.yml`, `_toc.yml` and `*.ipynb` files in the `book` directory. The book is hosted on Github Pages and will be automatically updated on push, and the student book will also be created automatically on push. | ||
|
||
Making changes requires that you set up a conda environment and build locally before making sure that it will build with github actions. We accepted rendered notebooks, but some oddities, such as kernels different than python, will make it crash. So we recommend that contributors first build the book with the added notebooks. | ||
|
||
```sh | ||
conda env create -f ./conda/environment.yml | ||
conda activate curriculum_book | ||
|
||
``` | ||
|
||
To modify the exact differences between this book and the student book, edit `.github/workflows/clean_book.py`. When you push, a github action will clone the repo and run this python file which modifies certain parts of `*.ipynb` file contents, then pushes to the student repo. To edit the student repo's README, edit `STUDENT_README.md`. The Github Actions workflow also automatically replaces `README.md` with `STUDENT_README.md` in the student repo. | ||
|
||
### `Student Response Sections` | ||
|
||
One modifications made by the `clean_book.py` workflow is to clear sections marked for student response. Code cells marked for student response may contain code in the teacher version of the book, but will have their code removed and replaced with a TODO comment in the student version. | ||
|
||
To mark a code cell to be cleared, insert a markdown cell directly preceding it with the following content: | ||
|
||
````markdown | ||
```{admonition} Student response section | ||
This section is left for the student to complete. | ||
``` | ||
```` | ||
|
||
## Serving Locally | ||
|
||
Activate the `curriculum_book` conda environment (or any conda environment that has the necessary jupyter book dependencies). Navigate to the root folder of the curriculum book repository in anaconda prompt, then run `python server.py`. | ||
|
||
On startup, the server will run `jb build book` to build all changes to the notebook and create the compiled HTML. The server code can take a `--no-build` flag (or `--nb` shorthand) if you don't want to build any changes you've made to the notebooks. In the case that you don't want to build changes made to the notebooks, you can just run `python serer.py --nb` from any terminal with python installed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# GeoSMART Curriculum Jupyter Book (ESS 469/569) | ||
|
||
[![Deploy](https://github.com/geo-smart/mlgeo-book/actions/workflows/deploy.yaml/badge.svg)](https://github.com/geo-smart/mlgeo-book/actions/workflows/deploy.yaml) | ||
[![Jupyter Book Badge](https://jupyterbook.org/badge.svg)](https://geo-smart.github.io/mlgeo-book) | ||
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/geo-smart/mlgeo-book/HEAD?urlpath=lab) | ||
[![GeoSMART Library Badge](book/img/curricula_badge.svg)](https://geo-smart.github.io/curriculum) | ||
|
||
## About | ||
|
||
This repository stores configuration for GeoSMART curriculum content, specifically the student version of the book. This version of the book should never be directly edited, as the student version is automatically generated on push. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,72 @@ | ||
# 2.1 Data Definitions | ||
|
||
Geoscientific data is particularly diverse: point measurements of soil moisture, high rate time series (1000 samples per second) seismograms, rasterized LandSAT imagery, Geospatial and Temporal simulated geophysical fields. | ||
Data is foundational to geosciences, allowing us to observe, model, and predict natural processes. Understanding the various types of data, their formats, and how they are structured is key to effectively using them in research and applications. In this lecture, we will discuss data modalities encountered in geoscience, typical data formats, the concept of arrays, and data frames. Geoscientific data is particularly diverse: point measurements of soil moisture, high rate time series (1000 samples per second) seismograms, rasterized LandSAT imagery, Geospatial and Temporal simulated geophysical fields. | ||
|
||
|
||
## The data modality | ||
Modality refers to the field, or genre of measurements. Different modalities may be seismograms, GPS displacement time series, surface air temperature time series. All of them are point-based measurements, share the same data type (1D arrays), could be saved in the same data format (e.g., CSV file), but sense different physical fields. | ||
<!-- For Vscode --> | ||
![Geoscientific Temporal Data](Dalle-geoscientific-data.png) | ||
|
||
**The data type** refers to the type of an object. Geoscientific data is *numeric* (floats, integer) and from which you can calculate things. It can also be *categorical* (i.e. qualitative or nominal). | ||
<!-- For Jupyter Book --> | ||
```{figure} Dalle-geoscientific-data.png | ||
:width: 400px | ||
--- | ||
name: Geoscientific Data AI-Art | ||
alt: Geoscientific Data AI-Art | ||
--- | ||
AI-Art from Dall-e: geoscientific data with dataframes, geospatial, and temporal data. | ||
``` | ||
*AI-Art from Dall-e: geoscientific data with dataframes, geospatial, and temporal data.* | ||
|
||
**The data format** refers to the specific type of parsing schema in a file (H5, CSV, JSON). It can be binary (H5), using standard character encodings (CSV, JSON), compressed (H5, Parquet), ... more details in Chapter 2.5. | ||
|
||
The difference in dimensionalities among geoscientific data challenges the design of machine learning models across disciplines. For most machine-learning practices, data modalities are classified as **dimension**. One example is a geophysical model that uses sattelite imagery (2D in space) with time series (1D in time) from point-based sensor measurements to predict an output. | ||
--- | ||
## Data modality in Geosciences | ||
In geosciences, data come in multiple modalities depending on the source, nature of the measurements, and intended applications: | ||
|
||
## Data Frames | ||
A **DataFrame** is a tabular data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet. Dataframes are relational databases. Their *data schema* defines how data is organized within the dataframe: it defines the column names to specific values. | ||
|
||
Data frames can be saved in row based file formats (Comma Separated Value CSV) or column-based formats (Parquet). | ||
* **In-situ Data**: Measurements taken directly at the site of interest. In-situ data often comes as time series. Examples include: | ||
* Temperature readings from weather stations. | ||
* Seismic wave data from seismographs. | ||
* Soil moisture content from field sensors. | ||
* **Remote Sensing Data**: Collected from instruments not in direct contact with the object of study, often using satellites, drones, or aircraft. *Geospatial Data* are tied to specific locations on Earth’s surface, often represented as maps or grids (e.g., GIS data). Examples include: | ||
* Spectral data (e.g., multispectral or hyperspectral images) from satellites. | ||
* Topography data using LiDAR or radar systems. | ||
* Sea surface temperature from satellites. | ||
* **Model Data**: Simulated data generated from computational models. For example: | ||
* Climate models predicting future temperatures or precipitation. | ||
* Hydrological models simulating water flow in river basins. | ||
* **Geophysical Data**: Subsurface measurements derived through indirect methods like seismic surveys, gravity, or magnetic studies. | ||
|
||
|
||
## Data Formats in Geosciences | ||
Geoscientific data is typically stored in formats that optimize storage, access, and sharing. Common formats include: | ||
|
||
**NetCDF** (Network Common Data Form): Commonly used for multidimensional scientific data, such as atmospheric, oceanic, or climate model outputs. It efficiently stores array-based data with metadata. | ||
|
||
**HDF** (Hierarchical Data Format): Similar to NetCDF but more general, used for large datasets including satellite imagery. | ||
|
||
**CSV** (Comma-Separated Values): A simple format for tabular data. It's human-readable and widely supported across software, but less efficient for large or multidimensional datasets. | ||
|
||
**GeoTIFF**: A popular format for raster geospatial data, often used in remote sensing and GIS applications. | ||
|
||
**Shapefiles**: A vector data format for geographic information system (GIS) software, which contains geometric locations and attribute information of spatial features. | ||
|
||
Most of these files are not cloud optimized, and we will explore next new format to accomodate large cloud storage systems. | ||
|
||
## Arrays | ||
An array is a fundamental data structure used to store collections of values, often representing multidimensional data (e.g., gridded spatial data). Arrays in geosciences typically represent data like temperature, pressure, or rainfall on a grid. | ||
|
||
Typical Dimensions of Arrays: | ||
* 1D Arrays: A single sequence of data, such as temperature measurements over time at one location. | ||
* 2D Arrays: Often represent gridded spatial data (e.g., a map of precipitation over a region). | ||
* 3D Arrays: Can include additional dimensions, such as time or depth. For instance, a 3D array could represent temperature at various depths and over time for a given region, 3D Earth model of geophysical properties such as seismic wavespeed, time varying (snapshots) of seismic wavefields, ... | ||
* 4D Arrays: Add even more complexity, such as a time-varying 3D grid (e.g., atmospheric data changing over space and time), time-lapse images of the subsurface properties. | ||
|
||
## Data Frames | ||
A data frame is a two-dimensional, tabular data structure, commonly used in data analysis. Data frames can be thought of as equivalent to a spreadsheet or database table, where: | ||
|
||
* Each **column** represents a variable or feature (e.g., date, location, temperature). | ||
* Each **row** corresponds to an observation or data point. | ||
Data frames are popular in programming environments like R and Python (via the Pandas library) because they offer flexibility in handling mixed data types (numerical, categorical, etc.) and are ideal for statistical analysis and data manipulation. | ||
|
||
[Lecture Slides](../../img/Google_Slides_Logo.svg)[! (https://docs.google.com/presentation/d/1PVu8vbYtX0G4W41TB537Irm5V845E4uPsIrWQRfoQB0/edit?usp=sharing) | ||
See Lecture Slides. | ||
## Lecture Slides | ||
[![Lecture Slides](../img/Google_Slides_Logo.svg)](https://docs.google.com/presentation/d/1PVu8vbYtX0G4W41TB537Irm5V845E4uPsIrWQRfoQB0/edit?usp=sharing) |
Oops, something went wrong.