Cli refactor (#44)

* refactor file_paths to use properties / attributes * refactor cli, remove csv input * add plotting cli to readme * only set this module's logs to debug * fix eval import, add output folder path to logs
CIROH-UA · Sep 9, 2024 · 1c0456e · 1c0456e
1 parent 4f355f7
commit 1c0456e
Show file tree

Hide file tree

Showing 15 changed files with 407 additions and 575 deletions.
diff --git a/README.md b/README.md
@@ -35,8 +35,9 @@ This tool prepares data to run a next gen simulation by creating a run package t
 python3 -m venv env
 source env/bin/activate
 # installing and running the tool
-pip install ngiab_data_preprocess
+pip install ngiab_data_preprocess[plot] # [plot] needed to install the evaluation and plotting module
 python -m map_app
+# CLI instructions at the bottom of the README
 ```
 
 The first time you run this command, it will download the hydrofabric and model parameter files from Lynker Spatial. If you already have them, place `conus.gpkg` and `model_attributes.parquet` into `modules/data_sources/`.
@@ -82,7 +83,7 @@ To use the tool:
 
 Once all the steps are finished, you can run NGIAB on the folder shown underneath the subset button.
 
-**Note:** When using the tool, the output will be stored in the `./output/<your-first-catchment>/` folder. There is no overwrite protection on the folders.
+**Note:** When using the tool, the output will be stored in the `./output/<your-input-feature>/` folder. There is no overwrite protection on the folders.
 
 # CLI Documentation
 
@@ -92,109 +93,69 @@ Once all the steps are finished, you can run NGIAB on the folder shown underneat
 ## Arguments
 
 - `-h`, `--help`: Show the help message and exit.
-- `-i INPUT_FILE`, `--input_file INPUT_FILE`: Path to a CSV or TXT file containing a list of catchment IDs, lat/lon pairs, or gage IDs; or a single catchment ID (e.g., `cat-5173`), a single lat/lon pair, or a single gage ID.
-- `-l`, `--latlon`: Use latitude and longitude instead of catchment IDs. When used with `-i`, the file should contain lat/lon pairs.
-- `-g`, `--gage`: Use gage IDs instead of catchment IDs. When used with `-i`, the file should contain gage IDs.
-- `-s`, `--subset`: Subset the hydrofabric to the given catchment IDs, locations, or gage IDs.
-- `-f`, `--forcings`: Generate forcings for the given catchment IDs, locations, or gage IDs.
-- `-r`, `--realization`: Create a realization for the given catchment IDs, locations, or gage IDs.
-- `--start_date START_DATE`: Start date for forcings/realization (format YYYY-MM-DD).
-- `--end_date END_DATE`: End date for forcings/realization (format YYYY-MM-DD).
-- `-o OUTPUT_NAME`, `--output_name OUTPUT_NAME`: Name of the subset to be created (default is the first catchment ID in the input file).
+- `-i INPUT_FEATURE`, `--input_feature INPUT_FEATURE`: ID of feature to subset. Providing a prefix will automatically convert to catid, e.g., cat-5173 or gage-01646500 or wb-1234.
+- `-l`, `--latlon`: Use latitude and longitude instead of catid. Expects comma-separated values via the CLI, e.g., `python -m ngiab_data_cli -i 54.33,-69.4 -l -s`.
+- `-g`, `--gage`: Use gage ID instead of catid. Expects a single gage ID via the CLI, e.g., `python -m ngiab_data_cli -i 01646500 -g -s`.
+- `-s`, `--subset`: Subset the hydrofabric to the given feature.
+- `-f`, `--forcings`: Generate forcings for the given feature.
+- `-r`, `--realization`: Create a realization for the given feature.
+- `--start_date START_DATE`, `--start START_DATE`: Start date for forcings/realization (format YYYY-MM-DD).
+- `--end_date END_DATE`, `--end END_DATE`: End date for forcings/realization (format YYYY-MM-DD).
+- `-o OUTPUT_NAME`, `--output_name OUTPUT_NAME`: Name of the output folder.
+- `-D`, `--debug`: Enable debug logging.
+- `--run`: Automatically run Next Gen against the output folder.
+- `--validate`: Run every missing step required to run ngiab.
+- `--eval`: Evaluate performance of the model after running and plot streamflow at USGS gages.
+- `-a`, `--all`: Run all operations: subset, forcings, realization, run Next Gen, and evaluate.
+
+## Usage Notes
+
+- If your input has a prefix of `gage-`, you do not need to pass `-g`.
+- The `-l`, `-g`, `-s`, `-f`, `-r` flags can be combined like normal CLI flags. For example, to subset, generate forcings, and create a realization, you can use `-sfr` or `-s -f -r`.
+- When using the `--all` flag, it automatically sets `subset`, `forcings`, `realization`, `run`, and `eval` to `True`.
+- Using the `--run` flag automatically sets the `--validate` flag.
 
 ## Examples
 
-`-l`, `-g`, `-s`, `-f`, `-r` can be combined like normal CLI flags. For example, to subset, generate forcings, and create a realization, you can use `-sfr` or `-s -f -r`.
-
-1. Subset hydrofabric using catchment IDs:
-   ```
-   python -m ngiab_data_cli -i catchment_ids.txt -s
+1. Subset hydrofabric using catchment ID:
+   ```bash
+   python -m ngiab_data_cli -i cat-7080 -s
    ```
 
 2. Generate forcings using a single catchment ID:
-   ```
-   python -m ngiab_data_cli -i cat-5173 -f --start_date 2023-01-01 --end_date 2023-12-31
+   ```bash
+   python -m ngiab_data_cli -i cat-5173 -f --start 2022-01-01 --end 2022-02-28
    ```
 
-3. Create realization using lat/lon pairs from a CSV file:
-   ```
-   python -m ngiab_data_cli -i locations.csv -l -r --start_date 2023-01-01 --end_date 2023-12-31 -o custom_output
+3. Create realization using a lat/lon pair and output to a named folder:
+   ```bash
+   python -m ngiab_data_cli -i 54.33,-69.4 -l -r --start 2022-01-01 --end 2022-02-28 -o custom_output
    ```
 
-4. Perform all operations using a single lat/lon pair:
-   ```
-   python -m ngiab_data_cli -i 54.33,-69.4 -l -s -f -r --start_date 2023-01-01 --end_date 2023-12-31
+4. Perform all operations using a lat/lon pair:
+   ```bash
+   python -m ngiab_data_cli -i 54.33,-69.4 -l -s -f -r --start 2022-01-01 --end 2022-02-28
    ```
 
-5. Subset hydrofabric using gage IDs from a CSV file:
-   ```
-   python -m ngiab_data_cli -i gage_ids.csv -g -s
+5. Subset hydrofabric using gage ID:
+   ```bash
+   python -m ngiab_data_cli -i 10154200 -g -s
+   # or
+   python -m ngiab_data_cli -i gage-10154200 -s
    ```
 
 6. Generate forcings using a single gage ID:
-   ```
-   python -m ngiab_data_cli -i 01646500 -g -f --start_date 2023-01-01 --end_date 2023-12-31
+   ```bash
+   python -m ngiab_data_cli -i 01646500 -g -f --start 2022-01-01 --end 2022-02-28
    ```
 
-## File Formats
-
-### 1. Catchment ID input:
-- CSV file: A single column of catchment IDs, or a column named 'cat_id', 'catchment_id', or 'divide_id'.
-- TXT file: One catchment ID per line.
-
-Example CSV (catchment_ids.csv):
-```
-cat_id,soil_type
-cat-5173,some
-cat-5174,data
-cat-5175,here
-```
-Or:
-```
-cat-5173
-cat-5174
-cat-5175
-```
-
-### 2. Lat/Lon input:
-- CSV file: Two columns named 'lat' and 'lon', or two unnamed columns in that order.
-- Single pair: Comma-separated values passed directly to the `-i` argument.
-
-Example CSV (locations.csv):
-```
-lat,lon
-54.33,-69.4
-55.12,-68.9
-53.98,-70.1
-```
-Or:
-```
-54.33,-69.4
-55.12,-68.9
-53.98,-70.1
-```
-
-### 3. Gage ID input:
-- CSV file: A single column of gage IDs, or a column named 'gage' or 'gage_id'.
-- TXT file: One gage ID per line.
-- Single gage ID: Passed directly to the `-i` argument.
-
-Example CSV (gage_ids.csv):
-```
-gage_id,station_name
-01646500,Potomac River
-01638500,Shenandoah River
-01578310,Susquehanna River
-```
-Or:
-```
-01646500
-01638500
-01578310
-```
+7. Run all operations, including Next Gen and evaluation/plotting:
+   ```bash
+   python -m ngiab_data_cli -i cat-5173 -a --start 2022-01-01 --end 2022-02-28
+   ```
 
 ## Output
 
-The script creates an output folder named after the first catchment ID in the input file, the provided output name, or derived from the first lat/lon pair or gage ID. This folder will contain the results of the subsetting, forcings generation, and realization creation operations.
+The script creates an output folder named after the first catchment ID in the input file, the provided output name, or derived from the first lat/lon pair or gage ID. This folder will contain the results of the subsetting, forcings generation, realization creation, Next Gen run (if applicable), and evaluation (if applicable) operations.
 
 </details>
diff --git a/modules/data_processing/create_realization.py b/modules/data_processing/create_realization.py
@@ -108,7 +108,7 @@ def make_noahowp_config(
     divide_conf_df.set_index("divide_id", inplace=True)
     start_datetime = start_time.strftime("%Y%m%d%H%M")
     end_datetime = end_time.strftime("%Y%m%d%H%M")
-    with open(file_paths.template_noahowp_config(), "r") as file:
+    with open(file_paths.template_noahowp_config, "r") as file:
         template = file.read()
 
     cat_config_dir = base_dir / "cat_config" / "NOAH-OWP-M"
@@ -134,7 +134,7 @@ def make_noahowp_config(
 def configure_troute(
     cat_id: str, config_dir: Path, start_time: datetime, end_time: datetime
 ) -> int:
-    with open(file_paths.template_troute_config(), "r") as file:
+    with open(file_paths.template_troute_config, "r") as file:
         troute = yaml.safe_load(file)  # Use safe_load for loading
 
     time_step_size = troute["compute_parameters"]["forcing_parameters"]["dt"]
@@ -170,7 +170,7 @@ def configure_troute(
 def make_ngen_realization_json(
     config_dir: Path, start_time: datetime, end_time: datetime, nts: int
 ) -> None:
-    with open(file_paths.template_realization_config(), "r") as file:
+    with open(file_paths.template_realization_config, "r") as file:
         realization = json.load(file)
 
     realization["time"]["start_time"] = start_time.strftime("%Y-%m-%d %H:%M:%S")
@@ -188,18 +188,18 @@ def create_realization(cat_id: str, start_time: datetime, end_time: datetime):
     paths = file_paths(cat_id)
 
     # make cfe init config files
-    cfe_atts_path = paths.config_dir() / "cfe_noahowp_attributes.csv"
+    cfe_atts_path = paths.config_dir / "cfe_noahowp_attributes.csv"
     catchment_configs = parse_cfe_parameters(pandas.read_csv(cfe_atts_path))
-    make_catchment_configs(paths.config_dir(), catchment_configs)
+    make_catchment_configs(paths.config_dir, catchment_configs)
 
     # make NOAH-OWP-Modular config files
-    make_noahowp_config(paths.config_dir(), cfe_atts_path, start_time, end_time)
+    make_noahowp_config(paths.config_dir, cfe_atts_path, start_time, end_time)
 
     # make troute config files
-    num_timesteps = configure_troute(cat_id, paths.config_dir(), start_time, end_time)
+    num_timesteps = configure_troute(cat_id, paths.config_dir, start_time, end_time)
 
     # create the realization
-    make_ngen_realization_json(paths.config_dir(), start_time, end_time, num_timesteps)
+    make_ngen_realization_json(paths.config_dir, start_time, end_time, num_timesteps)
 
     # create some partitions for parallelization
     paths.setup_run_folders()
@@ -210,7 +210,7 @@ def create_partitions(paths: Path, num_partitions: int = None) -> None:
     if num_partitions is None:
         num_partitions = multiprocessing.cpu_count()
 
-    cat_to_nex_pairs = get_cat_to_nex_flowpairs(hydrofabric=paths.geopackage_path())
+    cat_to_nex_pairs = get_cat_to_nex_flowpairs(hydrofabric=paths.geopackage_path)
     print(f"Creating {num_partitions} partitions for {len(cat_to_nex_pairs)} catchments.")
     nexus = defaultdict(list)
 
@@ -234,11 +234,11 @@ def create_partitions(paths: Path, num_partitions: int = None) -> None:
     #             part["nex-ids"].append(nexus[j][0])
     #     partitions.append(part)
 
-    # with open(paths.subset_dir() / f"partitions_{num_partitions}.json", "w") as f:
+    # with open(paths.subset_dir / f"partitions_{num_partitions}.json", "w") as f:
     #     f.write(json.dumps({"partitions": partitions}, indent=4))
 
     # write this to a metadata file to save on repeated file io to recalculate
-    with open(paths.metadata_dir() / "num_partitions", "w") as f:
+    with open(paths.metadata_dir / "num_partitions", "w") as f:
         f.write(str(num_partitions))