Skip to content

Commit

Permalink
Cli refactor (#44)
Browse files Browse the repository at this point in the history
* refactor file_paths to use properties / attributes

* refactor cli, remove csv input

* add plotting cli to readme

* only set this module's logs to debug

* fix eval import, add output folder path to logs
  • Loading branch information
JoshCu authored Sep 9, 2024
1 parent 4f355f7 commit 1c0456e
Show file tree
Hide file tree
Showing 15 changed files with 407 additions and 575 deletions.
133 changes: 47 additions & 86 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,9 @@ This tool prepares data to run a next gen simulation by creating a run package t
python3 -m venv env
source env/bin/activate
# installing and running the tool
pip install ngiab_data_preprocess
pip install ngiab_data_preprocess[plot] # [plot] needed to install the evaluation and plotting module
python -m map_app
# CLI instructions at the bottom of the README
```

The first time you run this command, it will download the hydrofabric and model parameter files from Lynker Spatial. If you already have them, place `conus.gpkg` and `model_attributes.parquet` into `modules/data_sources/`.
Expand Down Expand Up @@ -82,7 +83,7 @@ To use the tool:

Once all the steps are finished, you can run NGIAB on the folder shown underneath the subset button.

**Note:** When using the tool, the output will be stored in the `./output/<your-first-catchment>/` folder. There is no overwrite protection on the folders.
**Note:** When using the tool, the output will be stored in the `./output/<your-input-feature>/` folder. There is no overwrite protection on the folders.

# CLI Documentation

Expand All @@ -92,109 +93,69 @@ Once all the steps are finished, you can run NGIAB on the folder shown underneat
## Arguments

- `-h`, `--help`: Show the help message and exit.
- `-i INPUT_FILE`, `--input_file INPUT_FILE`: Path to a CSV or TXT file containing a list of catchment IDs, lat/lon pairs, or gage IDs; or a single catchment ID (e.g., `cat-5173`), a single lat/lon pair, or a single gage ID.
- `-l`, `--latlon`: Use latitude and longitude instead of catchment IDs. When used with `-i`, the file should contain lat/lon pairs.
- `-g`, `--gage`: Use gage IDs instead of catchment IDs. When used with `-i`, the file should contain gage IDs.
- `-s`, `--subset`: Subset the hydrofabric to the given catchment IDs, locations, or gage IDs.
- `-f`, `--forcings`: Generate forcings for the given catchment IDs, locations, or gage IDs.
- `-r`, `--realization`: Create a realization for the given catchment IDs, locations, or gage IDs.
- `--start_date START_DATE`: Start date for forcings/realization (format YYYY-MM-DD).
- `--end_date END_DATE`: End date for forcings/realization (format YYYY-MM-DD).
- `-o OUTPUT_NAME`, `--output_name OUTPUT_NAME`: Name of the subset to be created (default is the first catchment ID in the input file).
- `-i INPUT_FEATURE`, `--input_feature INPUT_FEATURE`: ID of feature to subset. Providing a prefix will automatically convert to catid, e.g., cat-5173 or gage-01646500 or wb-1234.
- `-l`, `--latlon`: Use latitude and longitude instead of catid. Expects comma-separated values via the CLI, e.g., `python -m ngiab_data_cli -i 54.33,-69.4 -l -s`.
- `-g`, `--gage`: Use gage ID instead of catid. Expects a single gage ID via the CLI, e.g., `python -m ngiab_data_cli -i 01646500 -g -s`.
- `-s`, `--subset`: Subset the hydrofabric to the given feature.
- `-f`, `--forcings`: Generate forcings for the given feature.
- `-r`, `--realization`: Create a realization for the given feature.
- `--start_date START_DATE`, `--start START_DATE`: Start date for forcings/realization (format YYYY-MM-DD).
- `--end_date END_DATE`, `--end END_DATE`: End date for forcings/realization (format YYYY-MM-DD).
- `-o OUTPUT_NAME`, `--output_name OUTPUT_NAME`: Name of the output folder.
- `-D`, `--debug`: Enable debug logging.
- `--run`: Automatically run Next Gen against the output folder.
- `--validate`: Run every missing step required to run ngiab.
- `--eval`: Evaluate performance of the model after running and plot streamflow at USGS gages.
- `-a`, `--all`: Run all operations: subset, forcings, realization, run Next Gen, and evaluate.

## Usage Notes

- If your input has a prefix of `gage-`, you do not need to pass `-g`.
- The `-l`, `-g`, `-s`, `-f`, `-r` flags can be combined like normal CLI flags. For example, to subset, generate forcings, and create a realization, you can use `-sfr` or `-s -f -r`.
- When using the `--all` flag, it automatically sets `subset`, `forcings`, `realization`, `run`, and `eval` to `True`.
- Using the `--run` flag automatically sets the `--validate` flag.

## Examples

`-l`, `-g`, `-s`, `-f`, `-r` can be combined like normal CLI flags. For example, to subset, generate forcings, and create a realization, you can use `-sfr` or `-s -f -r`.

1. Subset hydrofabric using catchment IDs:
```
python -m ngiab_data_cli -i catchment_ids.txt -s
1. Subset hydrofabric using catchment ID:
```bash
python -m ngiab_data_cli -i cat-7080 -s
```

2. Generate forcings using a single catchment ID:
```
python -m ngiab_data_cli -i cat-5173 -f --start_date 2023-01-01 --end_date 2023-12-31
```bash
python -m ngiab_data_cli -i cat-5173 -f --start 2022-01-01 --end 2022-02-28
```

3. Create realization using lat/lon pairs from a CSV file:
```
python -m ngiab_data_cli -i locations.csv -l -r --start_date 2023-01-01 --end_date 2023-12-31 -o custom_output
3. Create realization using a lat/lon pair and output to a named folder:
```bash
python -m ngiab_data_cli -i 54.33,-69.4 -l -r --start 2022-01-01 --end 2022-02-28 -o custom_output
```

4. Perform all operations using a single lat/lon pair:
```
python -m ngiab_data_cli -i 54.33,-69.4 -l -s -f -r --start_date 2023-01-01 --end_date 2023-12-31
4. Perform all operations using a lat/lon pair:
```bash
python -m ngiab_data_cli -i 54.33,-69.4 -l -s -f -r --start 2022-01-01 --end 2022-02-28
```

5. Subset hydrofabric using gage IDs from a CSV file:
```
python -m ngiab_data_cli -i gage_ids.csv -g -s
5. Subset hydrofabric using gage ID:
```bash
python -m ngiab_data_cli -i 10154200 -g -s
# or
python -m ngiab_data_cli -i gage-10154200 -s
```

6. Generate forcings using a single gage ID:
```
python -m ngiab_data_cli -i 01646500 -g -f --start_date 2023-01-01 --end_date 2023-12-31
```bash
python -m ngiab_data_cli -i 01646500 -g -f --start 2022-01-01 --end 2022-02-28
```

## File Formats

### 1. Catchment ID input:
- CSV file: A single column of catchment IDs, or a column named 'cat_id', 'catchment_id', or 'divide_id'.
- TXT file: One catchment ID per line.

Example CSV (catchment_ids.csv):
```
cat_id,soil_type
cat-5173,some
cat-5174,data
cat-5175,here
```
Or:
```
cat-5173
cat-5174
cat-5175
```

### 2. Lat/Lon input:
- CSV file: Two columns named 'lat' and 'lon', or two unnamed columns in that order.
- Single pair: Comma-separated values passed directly to the `-i` argument.

Example CSV (locations.csv):
```
lat,lon
54.33,-69.4
55.12,-68.9
53.98,-70.1
```
Or:
```
54.33,-69.4
55.12,-68.9
53.98,-70.1
```

### 3. Gage ID input:
- CSV file: A single column of gage IDs, or a column named 'gage' or 'gage_id'.
- TXT file: One gage ID per line.
- Single gage ID: Passed directly to the `-i` argument.

Example CSV (gage_ids.csv):
```
gage_id,station_name
01646500,Potomac River
01638500,Shenandoah River
01578310,Susquehanna River
```
Or:
```
01646500
01638500
01578310
```
7. Run all operations, including Next Gen and evaluation/plotting:
```bash
python -m ngiab_data_cli -i cat-5173 -a --start 2022-01-01 --end 2022-02-28
```

## Output

The script creates an output folder named after the first catchment ID in the input file, the provided output name, or derived from the first lat/lon pair or gage ID. This folder will contain the results of the subsetting, forcings generation, and realization creation operations.
The script creates an output folder named after the first catchment ID in the input file, the provided output name, or derived from the first lat/lon pair or gage ID. This folder will contain the results of the subsetting, forcings generation, realization creation, Next Gen run (if applicable), and evaluation (if applicable) operations.

</details>
22 changes: 11 additions & 11 deletions modules/data_processing/create_realization.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def make_noahowp_config(
divide_conf_df.set_index("divide_id", inplace=True)
start_datetime = start_time.strftime("%Y%m%d%H%M")
end_datetime = end_time.strftime("%Y%m%d%H%M")
with open(file_paths.template_noahowp_config(), "r") as file:
with open(file_paths.template_noahowp_config, "r") as file:
template = file.read()

cat_config_dir = base_dir / "cat_config" / "NOAH-OWP-M"
Expand All @@ -134,7 +134,7 @@ def make_noahowp_config(
def configure_troute(
cat_id: str, config_dir: Path, start_time: datetime, end_time: datetime
) -> int:
with open(file_paths.template_troute_config(), "r") as file:
with open(file_paths.template_troute_config, "r") as file:
troute = yaml.safe_load(file) # Use safe_load for loading

time_step_size = troute["compute_parameters"]["forcing_parameters"]["dt"]
Expand Down Expand Up @@ -170,7 +170,7 @@ def configure_troute(
def make_ngen_realization_json(
config_dir: Path, start_time: datetime, end_time: datetime, nts: int
) -> None:
with open(file_paths.template_realization_config(), "r") as file:
with open(file_paths.template_realization_config, "r") as file:
realization = json.load(file)

realization["time"]["start_time"] = start_time.strftime("%Y-%m-%d %H:%M:%S")
Expand All @@ -188,18 +188,18 @@ def create_realization(cat_id: str, start_time: datetime, end_time: datetime):
paths = file_paths(cat_id)

# make cfe init config files
cfe_atts_path = paths.config_dir() / "cfe_noahowp_attributes.csv"
cfe_atts_path = paths.config_dir / "cfe_noahowp_attributes.csv"
catchment_configs = parse_cfe_parameters(pandas.read_csv(cfe_atts_path))
make_catchment_configs(paths.config_dir(), catchment_configs)
make_catchment_configs(paths.config_dir, catchment_configs)

# make NOAH-OWP-Modular config files
make_noahowp_config(paths.config_dir(), cfe_atts_path, start_time, end_time)
make_noahowp_config(paths.config_dir, cfe_atts_path, start_time, end_time)

# make troute config files
num_timesteps = configure_troute(cat_id, paths.config_dir(), start_time, end_time)
num_timesteps = configure_troute(cat_id, paths.config_dir, start_time, end_time)

# create the realization
make_ngen_realization_json(paths.config_dir(), start_time, end_time, num_timesteps)
make_ngen_realization_json(paths.config_dir, start_time, end_time, num_timesteps)

# create some partitions for parallelization
paths.setup_run_folders()
Expand All @@ -210,7 +210,7 @@ def create_partitions(paths: Path, num_partitions: int = None) -> None:
if num_partitions is None:
num_partitions = multiprocessing.cpu_count()

cat_to_nex_pairs = get_cat_to_nex_flowpairs(hydrofabric=paths.geopackage_path())
cat_to_nex_pairs = get_cat_to_nex_flowpairs(hydrofabric=paths.geopackage_path)
print(f"Creating {num_partitions} partitions for {len(cat_to_nex_pairs)} catchments.")
nexus = defaultdict(list)

Expand All @@ -234,11 +234,11 @@ def create_partitions(paths: Path, num_partitions: int = None) -> None:
# part["nex-ids"].append(nexus[j][0])
# partitions.append(part)

# with open(paths.subset_dir() / f"partitions_{num_partitions}.json", "w") as f:
# with open(paths.subset_dir / f"partitions_{num_partitions}.json", "w") as f:
# f.write(json.dumps({"partitions": partitions}, indent=4))

# write this to a metadata file to save on repeated file io to recalculate
with open(paths.metadata_dir() / "num_partitions", "w") as f:
with open(paths.metadata_dir / "num_partitions", "w") as f:
f.write(str(num_partitions))


Expand Down
Loading

0 comments on commit 1c0456e

Please sign in to comment.