NextGen Water Modeling Framework Datastream

ngen-datastream automates the process of collecting and formatting input data for NextGen, orchestrating the NextGen run through NextGen In a Box (NGIAB), and handling outputs. This software allows users to run NextGen in an efficient, relatively painless, and reproducible fashion.

Getting Started

Installation: Follow the step-by-step instructions in the Installation Guide to set up ngen-datastream on your system.
Usage: Learn how to use ngen-datastream effectively by referring to the comprehensive Usage Guide.

Run it

ngen-datastream can be executed using cli args or a configuration file. Not all arguments are requried.

> cd ngen-datastream && ./scripts/stream.sh --help

Usage: ./scripts/stream.sh [options]
Either provide a datastream configuration file
  -c, --CONF_FILE           <Path to datastream configuration file> 
or run with cli args
  -s, --START_DATE          <YYYYMMDDHHMM or "DAILY"> 
  -e, --END_DATE            <YYYYMMDDHHMM> 
  -C, --FORCING_SOURCE      <Forcing source option> 
  -D, --DOMAIN_NAME         <Name for spatial domain> 
  -g, --GEOPACKAGE          <Path to geopackage file> 
  -I, --SUBSET_ID           <Hydrofabric id to subset>  
  -i, --SUBSET_ID_TYPE      <Hydrofabric id type>  
  -v, --HYDROFABRIC_VERSION <Hydrofabric version> 
  -R, --REALIZATION         <Path to realization file> 
  -d, --DATA_DIR            <Path to write to> 
  -r, --RESOURCE_DIR        <Path to resource directory> 
  -f, --NWM_FORCINGS_DIR    <Path to nwm forcings directory> 
  -F, --NGEN_FORCINGS       <Path to ngen forcings directory, tarball, or netcdf> 
  -N, --NGEN_BMI_CONFS      <Path to ngen BMI config directory> 
  -S, --S3_MOUNT            <Path to mount s3 bucket to>  
  -o, --S3_PREFIX           <File prefix within s3 mount> 
  -n, --NPROCS              <Process limit> 
  -y, --DRYRUN              <True to skip calculations>

First, obtain a hydrofabric file for the gage you wish to model. For example for Palisade, Colorado:

hfsubset -w medium_range -s nextgen -v 2.1.1 -l divides,flowlines,network,nexus,forcing-weights,flowpath-attributes,model-attributes -o palisade.gpkg -t hl "Gages-09106150"

Then feed the hydrofabric file to ngen-datastream along with a few cli args to define the time domain and NextGen configuration. This command will execute a 24 hour NextGen simulation over VPU 09 with CFE, SLOTH, PET, NOM, and t-route configuration distributed over 4 processes. See more examples.

./scripts/stream.sh -s 202006200100 -e 202006210000 -C NWM_RETRO_V3 -d $(pwd)/data/datastream_test -g $(pwd)/palisade.gpkg -R $(pwd)/configs/ngen/realization_sloth_nom_cfe_pet_troute.json -n 4

To see what's happening in ngen-datastream step-by-step, see the breakdown document.

Explanation of cli args (or variables in defined in `CONF_FILE`)

Field	Description	Required
START_DATE	Start simulation time (YYYYMMDDHHMM) or "DAILY"	✅
END_DATE	End simulation time (YYYYMMDDHHMM)	✅
FORCING_SOURCE	Select the forcings data provider. Options include NWM_RETRO_V2, NWM_RETRO_V3, NWM_OPERATIONAL_V3, NOMADS_OPERATIONAL	✅
DOMAIN_NAME	Name for spatial domain in run, stripped from gpkg if not supplied
GEOPACKAGE	Path to hydrofabric, can be s3URI, URL, or local file. Generate file with hfsubset or use SUBSET args.	Required here or file exists in `RESOURCE_DIR/config`
SUBSET_ID_TYPE	id type corresponding to "id" See hfsubset for options	Required here if user is not providing GEOPACKAGE and GEOPACKAGE_ATTR.
SUBSET_ID	catchment id to subset See hfsubset for options	Required here if user is not providing GEOPACKAGE and GEOPACKAGE_ATTR.
HYDROFABRIC_VERSION	$\geq$ v20.1 See hfsubset for options	Required here if user is not providing GEOPACKAGE and GEOPACKAGE_ATTR.
REALIZATION	Path to NextGen realization file	Required here or file exists in `RESOURCE_DIR/config`
DATA_DIR	Absolute local path to construct the datastream run.	✅
RESOURCE_DIR	Path to directory that contains the datastream resources. More explanation here.
NWM_FORCINGS_DIR	Path to local directory containing nwm files. Alternatively, these file could be stored in RESOURCE_DIR as nwm-forcings.
NGEN_BMI_CONFS	Path to local directory containing NextGen BMI configuration files. Alternatively, these files could be stored in RESOURCE_DIR under `config/`. See here for directory structure.
NGEN_FORCINGS	Path to local ngen forcings directory holding ngen forcing csv's or parquet's. Also accepts tarball or netcdf. Alternatively, this file(s) could be stored in RESOURCE_DIR at `ngen-forcings/`.
S3_MOUNT	Path to mount S3 bucket to. `ngen-datastream` will copy outputs here.
S3_PREFIX	Prefix to prepend to all files when copying to s3
DRYRUN	Set to "True" to skip all compute steps.
NPROCS	Maximum number of processes to use in any step of `ngen-datastream`. Defaults to `nprocs - 2`

`ngen-datastream` Output Directory Structure

When the datastream is executed a folder of the structure below will be constructed at DATA_DIR

DATA-PATH/
│
├── datastream-metadata/
│
├── datastream-resources/
|
├── ngen-run/

Each folder is explained below

`datastream-metadata/`

Holds metadata about the ngen-datastream excution that allows for a relatively condensed view of how the execution was performed. Example directory:

datastream-metadata/
│
├── conf_datastream.json
│
├── conf_fp.json
|
├── conf_nwmurl.json
|
├── profile_fp.txt
|
├── profile.txt
|
├── filenamelist.txt
|
├── realization.json

File Type	Path in Resource Directory	Description	Naming
DATASTREAM CONFIGURATION	datastream-metadata/conf_datastream.json	Holds metadata about the execution	conf_datastream.json
FORCING PROCESSOR CONFIGURATION	datastream-metadata/conf_fp.json	Configuration file for forcingprocessor. See here	conf_fp.json
NWM URL CONFIGURATION	datastream-metadata/conf_nwmurl.json	Configuration file for nwmurl. See here	conf_nwmurl.json
PROFILE	datastream-metadata/profile_fp.txt	Datetime print statements that allow for profiling each step in forcingprocessor	profile_fp.txt
PROFILE	datastream-metadata/profile.txt	Datetime print statements that allow for profiling each step in datastream	profile.txt
FILENAME LIST	datastream-metadata/filenamelist.txt	Local file paths or URLs to NWM forcings. Generated by nwmurl.	filenamelist.txt
REALIZATION	datastream-metadata/realization.json	NextGen configuration file. See here	realization.json

`RESOURCE_DIR` (`datastream-resources/`)

datastream-resources/ holds all the input data files required to perform the various computations ngen-datastream performs. This folder is not required as input, but will be a faster method for running ngen-datastream repeatedly over a given spatial or time domain.

Examples of the application of the resource directory:

Repeated executions. ngen-datastream will retrieve files (that are given as arguements) remotely, however this can take time depending on the networking between the data source and host. Storing these files locally in RESOURCE_DIR for repeated runs will save time and network bandwith. In addition, this saves on compute required to build input files from scratch.
Communicating runs. ngen-datastream versions everything in DATA_DIR, which means a single hash corresponds to a unique RESOURCE_DIR, which allows users to quickly identify potential differences between ngen-datastream input data.

Guide for building a `RESOURCE_DIR`

The easiest way to create a reusable resource directory is to execute ngen-datastream and save DATA_DIR/datastream-resources for later use. A user defined RESOURCE_DIR may take the form below. Only one file of each type is allowed (e.g. cannot have two geopackages or two realizations). Not every file is required. ngen-datastream will generate all required files by default, but will skip those steps if corresponding files exist in the resource directory.

RESOURCE_DIR/
|
├── config/
|   │
|   ├── nextgen_09.gpkg
|   |
|   ├── realization.json
|   |
|   ├── ngen.yaml
|   |
|   ├── partitions.json
|   |
|   ├── cat-config/
|   │   |
|   |   ├──PET/
|   │   |
|   |   ├──CFE/
|   │   |
|   |   ├──NOAH-OWP-M/
|
├── nwm-forcings/
|   |
|   ├── nwm.t00z.medium_range.forcing.f001.conus
|   |
|   ├── ...
|
├── ngen-forcings/
|   |
|   ├── forcings.nc
|

File Type	Path in Resource Directory	Example Link	Description	Naming
BMI CONFIGURATION	config/cat-config		directory holding BMI module configuration files defined in realization file.	See here
REALIZATION	config/realization.json	link	NextGen configuration	realization.json
GEOPACKAGE	config/nextgen_01.gpkg	link	Hydrofabric file of version $\geq$ v20.1 Ignored if subset hydrofabric options are set in datastream config. See Lynker-Spatial for complete VPU geopackages or hfsubset for generating your own custom domain. `hfsubset` can be invoked indirectly through `ngen-datastream` through the subsetting args.	*.gpkg
PARTITIONS	config/patitions_$NPROCS.json		File generated by the NextGen framework to distribute processing by spatial domain.	partitions.json
FORCINGS	nwm-forcings/*.nc	link	NetCDF National Water Model forcing files. These are not saved to the resource directory by default.	*.nc
FORCINGS	ngen-forcings/*.nc		netcdf holding ngen forcings.	.nc (by default), .tar.gz, .csv, .parquet

`ngen-run/`

Running NextGen requires building a standard run directory complete with only the necessary files. The datastream constructs this automatically, but can be manually built as well. Below is an explanation of the standard. Reference for discussion of the standard here.

A NextGen run directory ngen-run is composed of three necessary subfolders config, forcings, outputs and an optional fourth subfolder metadata.

ngen-run/
│
├── config/
│
├── forcings/
|
├── metadata/
│
├── outputs/

The ngen-run directory contains the following subfolders:

config: model configuration files and hydrofabric configuration files. A deeper explanation here
forcings: catchment-level forcing timeseries files. These can be generated with the forcingprocessor. Forcing files contain variables like wind speed, temperature, precipitation, and solar radiation.
metadata is an optional subfolder. This is programmatically generated and it used within to ngen. Do not edit this folder.
outputs: This is where ngen will place the output files.

Configuration directory `ngen-run/config/`

This folder contains the NextGen realization file, which serves as the primary model configuration for the ngen framework. This file specifies which models to run and with which parameters, run parameters like date and time, and hydrofabric specifications.

Based on the models defined in the realization file, BMI configuration files may be required. For those models that require per-catchment configuration files, a folder will hold these files for each model in ngen-run/config/cat-config. See here for which models ngen-datastream supports automated BMI configuration file generation. See the directory structure convention below.

ngen-run/
|
├── config/
|   │
|   ├── nextgen_09.gpkg
|   |
|   ├── realization.json
|   |
|   ├── ngen.yaml
|   |
|   ├── cat-config/
|   │   |
|   |   ├──PET/
|   │   |
|   |   ├──CFE/
|   │   |
|   |   ├──NOAH-OWP-M/
...

Versioning

ngen-datstream uses a merkel tree hashing algorithm to version each execution with merkdir. This means all input and output files in a ngen-datastream execution will be hashed in such a way that tracking minute changes among millions of files is trivial.

License

ngen-datastream is distributed under GNU General Public License v3.0 or later

Name		Name	Last commit message	Last commit date
Latest commit History 677 Commits
.github/workflows		.github/workflows
NGIAB-CloudInfra @ 10d7948		NGIAB-CloudInfra @ 10d7948
configs		configs
docker		docker
docs		docs
examples		examples
forcingprocessor		forcingprocessor
python		python
scripts		scripts
terraform		terraform
tests/scripts		tests/scripts
.gitmodules		.gitmodules
CREDITS.md		CREDITS.md
INSTALL.md		INSTALL.md
LICENSE.md		LICENSE.md
ODbl.md		ODbl.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NextGen Water Modeling Framework Datastream

Getting Started

Run it

Explanation of cli args (or variables in defined in `CONF_FILE`)

`ngen-datastream` Output Directory Structure

`datastream-metadata/`

`RESOURCE_DIR` (`datastream-resources/`)

Guide for building a `RESOURCE_DIR`

`ngen-run/`

Configuration directory `ngen-run/config/`

Versioning

License

About

Releases

Packages

Languages

License

JordanLaserGit/ngen-datastream

Folders and files

Latest commit

History

Repository files navigation

NextGen Water Modeling Framework Datastream

Getting Started

Run it

Explanation of cli args (or variables in defined in CONF_FILE)

ngen-datastream Output Directory Structure

datastream-metadata/

RESOURCE_DIR (datastream-resources/)

Guide for building a RESOURCE_DIR

ngen-run/

Configuration directory ngen-run/config/

Versioning

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Explanation of cli args (or variables in defined in `CONF_FILE`)

`ngen-datastream` Output Directory Structure

`datastream-metadata/`

`RESOURCE_DIR` (`datastream-resources/`)

Guide for building a `RESOURCE_DIR`

`ngen-run/`

Configuration directory `ngen-run/config/`

Packages