Skip to content

Commit

Permalink
add data format checker for own data
Browse files Browse the repository at this point in the history
  • Loading branch information
OuyangWenyu committed Mar 25, 2024
1 parent 995a335 commit d18c904
Show file tree
Hide file tree
Showing 10 changed files with 560 additions and 514 deletions.
51 changes: 20 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,26 @@ $ python -m ipykernel install --user --name xaj --display-name "xaj"

### Prepare data

To use your own data to run the model, we set a data interface, here is the convention:
To use your own data to run the model, you can prepare the data in the required format:

For one basin (We only support one basin now), the data is put in one csv/txt file.
There are three necessary columns: "time", "prcp", "pet", and "flow". "time" is the time series, "prcp" is the precipitation, "pet" is the potential evapotranspiration, and "flow" is the observed streamflow.
The time series should be continuous (NaN values are allowed), and the time step should be the same for all columns. The time format should be "YYYY-MM-DD HH:MM:SS". The data should be sorted by time.

You can run a checker function to see if the data is in the right format:

```Shell
$ cd hydromodel/scripts
$ python check_data_format.py --data_file <absolute path of the data file>
```

Then, you can use the data_preprocess module to transform the data to the required format:

```Shell
$ python datapreprocess4calibrate.py --data <name of the data file> --exp <name of the directory of the prepared data>
```

The data will be transformed in data interface, here is the convention:

- All input data for models are three-dimensional NumPy array: [time, basin, variable], which means "time" series data
for "variables" in "basins"
Expand Down Expand Up @@ -90,36 +109,6 @@ More details about the analysis could be seen in show_results.ipynb file. It is

Now we only provide some simple statistics calculations.

### How to make the sample data

In this part, we simply introduce how we prepare the sample data.

Here We provide an example for some basins in [the CAMELS dataset](https://ral.ucar.edu/solutions/products/camels), a very common used dataset for hydrological model evaluation.

You can download CAMELS according to this [instruction](https://github.com/OuyangWenyu/hydrodataset).

Check if you have successfully downloaded and put it in the right place.

```Shell
$ conda activate xaj
$ python
>>> import os
>>> from hydrodataset.camels import Camels
>>> camels = Camels(data_path=os.path.join("camels", "camels_us"), download=False, region="US")
```

if any error is raised, please see this [instruction](https://github.com/OuyangWenyu/hydrodataset) again.

Then, we provide a script to transform data organized like CAMELS to the required format, you can use it like this:

```Shell
$ cd hydromodel/app
$ python datapreprocess4calibrate.py --camels_dir <name of camels_dir> --exp <name of directory of the prepared data> --calibrate_period <calibration period> --test_period <test period> --basin_id <basin id>
# such as: python datapreprocess4calibrate.py --camels_name camels_us --exp xxx --calibrate_period 1990-10-01 2000-10-01 --test_period 2000-10-01 2010-10-01 --basin_id 01439500 06885500 08104900 09510200
```

Then you can see some files in hydromodel/example/xxx directory.

## Why does hydro-model-xaj exist

When we want to learn about the rainfall-runoff process and make forecasts for floods, etc. We often use classic hydrological
Expand Down
2 changes: 1 addition & 1 deletion env-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ dependencies:
- twine
- bump2version
- muskingumcunge
- hydrodataset
- hydrodata
44 changes: 44 additions & 0 deletions hydromodel/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
PRCP_NAME = "prcp(mm/day)"
PET_NAME = "pet(mm/day)"
ET_NAME = "et(mm/day)"
FLOW_NAME = "flow(m^3/s)"
NODE_FLOW_NAME = "node1_flow(m^3/s)"
AREA_NAME = "area(km^2)"
TIME_NAME = "time"
TIME_FORMAT = "%Y-%m-%d %H:%M:%S"
ID_NAME = "id"
NAME_NAME = "name"


def remove_unit_from_name(name_with_unit):
"""
Remove the unit from a variable name.
Parameters
----------
name_with_unit : str
The name of the variable including its unit, e.g., "prcp(mm/day)".
Returns
-------
str
The name of the variable without the unit, e.g., "prcp".
"""
return name_with_unit.split("(")[0]


def get_unit_from_name(name_with_unit):
"""
Extract the unit from a variable name.
Parameters
----------
name_with_unit : str
The name of the variable including its unit, e.g., "prcp(mm/day)".
Returns
-------
str
The unit of the variable, e.g., "mm/day".
"""
return name_with_unit.split("(")[1].strip(")") if "(" in name_with_unit else ""
Loading

0 comments on commit d18c904

Please sign in to comment.