-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata Specification Proposal #17
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
{ | ||
"RUN_CONFIG": { | ||
"description": "Run Config for Hurricane Zach", | ||
"Model": { | ||
"realization": { | ||
"Name": "CFE_SLOTH_realization", | ||
"Type": "filesystem", | ||
"Path": "/ngen/realization.json", | ||
"Hash": "d2c6fbda93c134de495d69745fae11087784d2aa" | ||
}, | ||
"configuration": { | ||
"Name": "CFE_SLOTH_ini", | ||
"Type": "filesystem", | ||
"Path": "/ngen/cfe_sloth.ini", | ||
"Hash": "ab0c3dff59c4b282b172b90128159fda3386d012" | ||
} | ||
}, | ||
"Forcings": { | ||
"inputs": { | ||
"Name": "Hurricane Zach", | ||
"Type": "bucket", | ||
"Path": "s3://awi-ciroh-ngen-data/AWI_001/forcings/", | ||
"Hash": "220fff8bdd3b85f23d93e73b4bc7e3bc2c7c0f35" | ||
}, | ||
"Hydrofabric": { | ||
"catchment": { | ||
"Name": "Catchment(s) File v1.0", | ||
"Type": "bucket", | ||
"Path": "s3://awi-ciroh-ngen-data/AWI_001/catchments.geojson", | ||
"Hash": "da39a3ee5e6b4b0d3255bfef95601890afd80709" | ||
}, | ||
"nexus": { | ||
"Name": "Nexus File v1.0", | ||
"Type": "bucket", | ||
"Path": "s3://awi-ciroh-ngen-data/AWI_001/nexus.geojson", | ||
"Hash": "cae054f62f697080d822fea9c7d9c268be8b7ac9" | ||
}, | ||
"crosswalk": { | ||
"Name": "Crosswalk File v1.0", | ||
"Type": "bucket", | ||
"Path": "s3://awi-ciroh-ngen-data/AWI_001/crosswalk.geojson", | ||
"Hash": "4c39964d1e30779f9992d3c00e94a39952cb102a" | ||
} | ||
} | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,182 +1,56 @@ | ||
# Welcome to NextGen Framework National Water Model Community Repo. (NextGen In A Box). | ||
|
||
We are doing a case study : NWM run for Sipsey Fork,Black Warrior river | ||
- We don’t want to run all of CONUS | ||
- We want to run NextGen locally | ||
- We want to have control over inputs / config. | ||
- How can we do it? Answer: NextGen In A Box | ||
|
||
This repository contains : | ||
- **Dockerfile** for running NextGen Framework (docker/Dockerfile*) | ||
- **Terraform** configuration files for provisioning infrastructure in AWS (terraform/README.md) | ||
- Documentation of how to use the **infrastructure** and run the model. (README.md) | ||
|
||
## Table of Contents | ||
* [Prerequisites:](#prerequisites-) | ||
+ [Install docker](#install-docker-) | ||
+ [Install WSL on Windows](#Install-WSL-on-Windows-) | ||
+ [Download the input data in "ngen-data" folder from S3 bucket ](#download-the-input-data-in--ngen-data--folder-from-s3-bucket--) | ||
- [Linux & Mac](#linux---mac) | ||
- [Windows Steps:](#windows-steps-) | ||
* [Run NextGen-In-A-Box](#run-nextgen-in-a-box) | ||
+ [Clone CloudInfra repo](#clone-cloudinfra-repo) | ||
+ [How to run the model script?](#how-to-run-the-model-script-) | ||
+ [Output of the model script](#output-of-the-model-script) | ||
|
||
|
||
## Prerequisites: | ||
|
||
### Install docker and validate docker is up: | ||
- On *Windows*: | ||
- [Install Docker Desktop on Windows](https://docs.docker.com/desktop/install/windows-install/#install-docker-desktop-on-windows) | ||
- Once docker is installed, start Docker Destop. | ||
- Open powershell -> right click and `Run as an Administrator` | ||
- Type `docker ps -a` to make sure docker is working. | ||
|
||
- On *Mac*: | ||
- [Install docker on Mac](https://docs.docker.com/desktop/install/mac-install/) | ||
- Once docker is installed, start Docker Desktop. | ||
- Open terminal app | ||
- Type `docker ps -a` to make sure docker is working. | ||
|
||
- On *Linux*: | ||
- [Install docker on Linux](https://docs.docker.com/desktop/install/linux-install/) | ||
- Follow similar steps as *Mac* for starting Docker and verifying the installation | ||
|
||
### Install WSL on Windows: | ||
|
||
1. Open **PowerShell** as an administrator. | ||
|
||
2. Run the following command to enable WSL feature: | ||
``` | ||
wsl --install | ||
``` | ||
|
||
3. Wait for the installation to complete. It may take some time as it will download and install the necessary components. | ||
|
||
4. Once the installation is finished, you will be prompted to restart your computer. Type `Y` and press Enter to restart. | ||
|
||
5. After the computer restarts, open **Microsoft Store**. | ||
|
||
6. Search for "WSL" or "Windows Subsystem for Linux" in the search bar. | ||
|
||
7. Select the desired Linux distribution (e.g., Ubuntu, Debian, Fedora) from the search results. | ||
|
||
8. Click on the distribution and then click the **Install** button. | ||
|
||
9. Wait for the installation to complete. The installation process will download the Linux distribution package from the Microsoft Store. | ||
|
||
10. Once the installation is finished, you can launch the Linux distribution from the Start menu or by running its command (e.g., `ubuntu`). | ||
|
||
11. The first time you launch the Linux distribution, it will take some time to set up. Follow the on-screen instructions to create a username and password. | ||
|
||
12. After the setup is complete, you can use the Linux distribution through WSL on your Windows system. | ||
|
||
|
||
### Download the input data in "ngen-data" folder from S3 bucket : | ||
|
||
#### Linux & Mac & WSL | ||
|
||
```Linux | ||
$ mkdir NextGen | ||
$ cd NextGen | ||
$ mkdir ngen-data | ||
$ cd ngen-data | ||
$ wget --no-parent https://ciroh-ua-ngen-data.s3.us-east-2.amazonaws.com/AWI-001/AWI_03W_113060_001.tar.gz | ||
$ tar -xf AWI_03W_113060_001.tar.gz | ||
$ cd AWI_03W_113060_001 | ||
``` | ||
|
||
|
||
#### Windows Steps: | ||
#### Note: It is recommended to use WSL and follow [instructions for Linux & Mac & WSL](#Linux-&-Mac-&-WSL-) | ||
|
||
```Windows | ||
$ mkdir NextGen | ||
$ cd NextGen | ||
$ mkdir ngen-data | ||
$ cd ngen-data | ||
$ Invoke-WebRequest -Uri "https://ciroh-ua-ngen-data.s3.us-east-2.amazonaws.com/AWI-001/AWI_03W_113060_001.tar.gz" | ||
$ tar -xzf "\AWI_03W_113060_001.tar.gz" | ||
$ cd AWI_03W_113060_001 | ||
``` | ||
|
||
## Run NextGen In A Box | ||
|
||
### Clone CloudInfra repo | ||
|
||
Navigate to NextGen directory and clone the repo using below commands: | ||
|
||
``` | ||
$ git clone https://github.com/CIROH-UA/CloudInfra.git | ||
|
||
$ cd CloudInfra | ||
``` | ||
Once you are in *CloudInfra* directory, you should see `guide.sh` in it. Now, we are ready to run the model using that script. | ||
|
||
### How to run the model script? | ||
|
||
#### WSL, Linux and Mac Steps: | ||
Follow below steps to run `guide.sh` script | ||
``` | ||
# Note: Make sure you are in ~/Documents/NextGen/CloudInfra directory | ||
$ ./guide.sh | ||
|
||
``` | ||
### Output of the model guide script | ||
|
||
>*What you will see when you run above `guide.sh`?* | ||
|
||
- The script prompts the user to enter the file path for the input data directory where the forcing and config files are stored. | ||
|
||
Run the following command based on your OS and copy the path value: | ||
|
||
**Windows:** | ||
``` | ||
C:> cd ~\<path>\NextGen\ngen-data | ||
c:> pwd | ||
and copy the path | ||
### Reproducing | ||
|
||
Get the files from the appropriate bucket & RUN_CONFIG | ||
``` | ||
wget --no-parent https://awi-ciroh-ngen-data.s3.us-east-2.amazonaws.com/AWI_001/AWI_03W_113060_001.tar.gz . | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would suggest using |
||
|
||
tar -xvf AWI_03W_113060_001.tar.gz | ||
``` | ||
Then we can confirm file location and integrity against the example JSON file. | ||
``` | ||
{ | ||
"RUN_CONFIG": { | ||
"description": "Run Config for AWI_03W_113060_001", | ||
"Model": { | ||
"realization": { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for getting this going, @ZacharyWills! Having now stared at this for 30 minutes and gone back and forth about what comments I want to make, it seems clear to me that we could solve this problem in a lot of different ways. But, as will all solutions, they all have pros and cons. My biggest take away so far is that this initial proposal should set up future proposals for success. I think the way we do that is by going with the simplest data model that does not limit future additions in places where we want expansion. To get to that place, I think we first need some guard rails on what the spec is not. I figure we wont be able to determine all the things it is to start, without using it, so it is likely easier to say what it isn't. So kind of like goals, but anti-goals if you will haha. The purpose of saying what the spec isn't is to create consensus and avoid scope creep and have something to point to when suggestions are brought to the table. Please disagree and challenge me on this. It may just be an idea in my head and not a good one! My general thoughts so far are (numbered, so it is easier to respond. Not ordered):
I don't see it the same. I think you can make the case that you can have forcing over a given hydrofabric domain, so {
"Meta": {
"Hash": {
"Algo": "sha256"
}
}
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the hierarchy is based off of the assumption that things within the workflow are somewhat dependent (generating forcings for only your selected hydrofabric is more efficient than always generating them everywhere) but there's a lot of discussion about that going on now here: owp-spatial/hfsubsetCLI#28 I totally agree on the syntax changes and I'll put some work in to fix the proposal to reflect those insights. For the most part I just wanted something to get the ball rolling on how Thanks again for taking a look at this and any input is so valuable at this point to get us going from 0 -> something. |
||
"Name": "AWI_simplified_realization", | ||
"Type": "filesystem", | ||
"Path": "AWI_03W_113060_001/config/awi_simplified_realization.json", | ||
"Hash": "792554dcf48b61120cfc648cc6711d2b5e61d321" | ||
}, | ||
"configuration": { | ||
"Name": "CFE_SLOTH_ini", | ||
"Type": "filesystem", | ||
"Path": "AWI_03W_113060_001/config/awi_config.ini", | ||
"Hash": "e8283864026040ce1ce5a7dca79b9f4f04744b47" | ||
} | ||
}, | ||
"Forcings": { | ||
"inputs": { | ||
"Name": "Hurricane Zach", | ||
"Type": "filesystem", | ||
"Path": "AWI_03W_113060_001/forcings", | ||
"Hash": "da39a3ee5e6b4b0d3255bfef95601890afd80709" | ||
}, | ||
"Hydrofabric": { | ||
"catchment": { | ||
"Name": "Catchment(s) File v1.0", | ||
"Type": "bucket", | ||
"Path": "AWI_03W_113060_001/config/catchment_data.geojson", | ||
"Hash": "880feb145f254976600bd8968ef730105de6cbee" | ||
}, | ||
"nexus": { | ||
"Name": "Nexus File v1.0", | ||
"Type": "bucket", | ||
"Path": "AWI_03W_113060_001/config/nexus_data.geojson", | ||
"Hash": "86a029a15e7cf67bc69f2390038a74b69b09af04" | ||
}, | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
Note that to recreate the sums for the focings file, I simply summed the files then piped the output to its own sum. | ||
``` | ||
shasum -a 256 AWI_03W_113060_001/forcings/ | shasum | ||
``` | ||
|
||
**Linux/Mac:** | ||
``` | ||
$ cd ~/<path>/NextGen/ngen-data | ||
$ pwd | ||
and copy the path | ||
|
||
``` | ||
where <path> is the localtion of NextGen folder. | ||
|
||
- The script sets the entered directory as the `HOST_DATA_PATH` variable and uses it to find all the catchment, nexus, and realization files using the `find` command. | ||
- Next, the user is asked whether to run NextGen or exit. If `run_NextGen` is selected, the script pulls the related image from the awiciroh Dockerhub, based on the local machine's architecture: | ||
``` | ||
For Mac (arm architecture), it pulls awiciroh/ciroh-ngen-image:latest-arm. | ||
For x86 machines, it pulls awiciroh/ciroh-ngen-image:latest-x86. | ||
``` | ||
|
||
- The user is then prompted to select whether they want to run the model in parallel or serial mode. | ||
- If the user selects parallel mode, the script uses the `mpirun` command to run the model and generates a partition file for the NGEN model. | ||
- If the user selects the catchment, nexus, and realization files they want to use. | ||
|
||
Example NGEN run command for parallel mode: | ||
``` | ||
mpirun -n 2 /dmod/bin/ngen-parallel | ||
/ngen/ngen/data/config/catchments.geojson "" | ||
/ngen/ngen/data/config/nexus.geojson "" | ||
/ngen/ngen/data/config/awi_simplified_realization.json | ||
/ngen/partitions_2.json | ||
``` | ||
- If the user selects serial mode, the script runs the model directly. | ||
|
||
Example NGEN run command for serial mode: | ||
``` | ||
/dmod/bin/ngen-serial | ||
/ngen/ngen/data/config/catchments.geojson "" | ||
/ngen/ngen/data/config/nexus.geojson "" | ||
/ngen/ngen/data/config/awi_simplified_realization.json | ||
``` | ||
- After the model has finished running, the script prompts the user whether they want to continue. | ||
- If the user selects 1, the script opens an interactive shell. If the user selects 2, then the script copies the output data from container to local machine. | ||
- If the user selects 3, then the script exits. | ||
|
||
The output files are copied to the `outputs` folder in '/NextGen/ngen-data/AWI_03W_113060_001/' directory. |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. General comments:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As for general formatting, what are your thoughts on switching to a table format similar to schema.org? A schema.org Model
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I figure |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- CONFIG START --- | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can remove this. It kind of implies that the listed fields need to be ordered (although, I think that is a stretch). |
||
### Model | ||
| Name of Model files | type | PATH to .json file | HASH of file | | ||
| ----------- | ----------- | ----------- | ----------- | | ||
| CFE_SLOTH_realization | filesystem | /ngen/realization.json | d2c6fbda93c134de495d69745fae11087784d2aa | | ||
| CFE_SLOTH_ini | filesystem | /ngen/cfe_sloth.ini | ab0c3dff59c4b282b172b90128159fda3386d012 | | ||
### Forcings | ||
| Name of Forcings | type | PATH to forcings file(s) | HASH of file(s) | | ||
| ----------- | ----------- | ----------- | ----------- | | ||
| Hurricane Zach | bucket | s3://awi-ciroh-ngen-data/AWI_001/forcings/ | 220fff8bdd3b85f23d93e73b4bc7e3bc2c7c0f35 | | ||
### Hydrofabric | ||
| Name of Hydrofabric | type | PATH to .json file | HASH of file | | ||
| ----------- | ----------- | ----------- | ----------- | | ||
| Catchment(s) File | bucket | s3://awi-ciroh-ngen-data/AWI_001/catchments.geojson | da39a3ee5e6b4b0d3255bfef95601890afd80709 | | ||
| Nexus File | bucket | s3://awi-ciroh-ngen-data/AWI_001/nexus.geojson | cae054f62f697080d822fea9c7d9c268be8b7ac9 | | ||
| Crosswalk File | bucket | s3://awi-ciroh-ngen-data/AWI_001/crosswalk.geojson | 4c39964d1e30779f9992d3c00e94a39952cb102a | | ||
|
||
All values within each section are defined for the run to evaluate when a run is unique and the change that makes it so. For example changing the hash of any of the fields for the referenced files should change the RUN CONFIG into something new. | ||
|
||
Note: the Realization, Catchment, Nexus and other model-required files must be searching by name. | ||
|
||
| Valid | Not Valid | | ||
| ----- | --------- | | ||
| hurricane_marty_realization.json | hurricane_marty.json | | ||
| Houston_catchments.geojson | Houston.geojson | | ||
| Nexus_2012_flood.geojson | 2012_flood.geojson | |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if this file was intended to make it into the proposal? It seems like some of this content could be reshaped a bit and make its way into the README instead? Maybe we do want a living notes document though. Although, a github discussion thread might be more appropriate and accomplish the same goal. Thoughts? |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Notes on a "NextGen Universal Package" | ||
|
||
There's been some discussion in the community about the combinatorial effects of opening up the possibility to build ever-more-complicated systems while making it more difficult to resproduce results in the community, or to share results for integration into our collective understanding of the world. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. small typo! resproduce -> reproduce |
||
|
||
The first step to building that collective understanding is based on having the same interpretation and categorization of the data we have and will produce. | ||
|
||
Firstly there must be a heirarchical underestanding of how we represent data and configurations. Note that these aren't 1:1 with the data and configurations, these are for the heirarchical framing of how those are organized. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. small typo! underestanding -> understanding |
||
|
||
The identified components are as follows: | ||
|
||
- _Data_ | ||
- Hydrofabric | ||
- Underlying physical understanding of the domain | ||
- Forcings | ||
- Input data over a temporal and spacial domain | ||
- _Configuration_ | ||
- Realization of model allocation | ||
- Models (version of BMI-compatible model) | ||
- Model configurations | ||
|
||
Each Instance of the combination of these things is hereafter called a RUN and the grouping of all the aforementioned items necessitates a standardized RUN CONFIG. | ||
|
||
A preliminary Specification for NextGen RUN CONFIGs has been proposed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I likely just dont know the context, so this might be a moot point, but what is a
RUN_CONFIG
? Can we just remove it? If not, can we provide some supporting text to say what aRUN_CONFIG
is?