Skip to content

Commit

Permalink
updates data folder struc and improve readme
Browse files Browse the repository at this point in the history
  • Loading branch information
BielStela committed Jul 26, 2024
1 parent 6b98fd5 commit 4bc7e3d
Show file tree
Hide file tree
Showing 12 changed files with 662 additions and 62 deletions.
62 changes: 20 additions & 42 deletions science/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,13 @@ Take a look at the [Kedro documentation](https://docs.kedro.org) to get started.

The project contains one pipeline for now: `globe`

### `global`
### `lowvshigh`

Pipeline to split nextgems global datasets (low and high resolution) into a set of tiffs (one per timestep) to use in blender to render a rotating globe.
Pipeline to generate the comparisson between low and high resolution simulations. Currently it has:

###
- splits nextgems global datasets into a set of tiffs (one per timestep) to use in blender to render a rotating globe.
- video generation pipeline for a regions defined in `conf/parameters.yml`

## Rules and guidelines

In order to get the best out of the template:

* Don't remove any lines from the `.gitignore` file we provide
* Make sure your results can be reproduced by following a [data engineering convention](https://docs.kedro.org/en/stable/faq/faq.html#what-is-data-engineering-convention)
* Don't commit data to your repository
* Don't commit any credentials or your local configuration to your repository. Keep all your credentials and local configuration in `conf/local/`

## How to install dependencies

Expand All @@ -42,51 +35,36 @@ You can run your Kedro project with:
```
kedro run
```

## How to test your Kedro project

Have a look at the files `src/tests/test_run.py` and `src/tests/pipelines/data_science/test_pipeline.py` for instructions on how to write your tests. Run the tests as follows:
I recomend use the `ParallelRunner` to run the nodes in parallel

```
pytest
kedro run --runner=ParallelRunner
```

To configure the coverage threshold, look at the `.coveragerc` file.

## Project dependencies

To see and update the dependency requirements for your project use `requirements.txt`. Install the project requirements with `pip install -r requirements.txt`.
### Run a subset of the pipeline

[Further information about project dependencies](https://docs.kedro.org/en/stable/kedro_project_setup/dependencies.html#project-specific-dependencies)

## How to work with Kedro and notebooks

> Note: Using `kedro jupyter` or `kedro ipython` to run your notebook provides these variables in scope: `catalog`, `context`, `pipelines` and `session`.
>
> Jupyter, JupyterLab, and IPython are already included in the project requirements by default, so once you have run `pip install -r requirements.txt` you will not need to take any extra steps before you use them.
### Jupyter
Kedro allows run subsets by selecting only nodes, pipelines or tags. Check the tags in the pipeline code or in kedro viz.
For example to run only the detailed videos pipelines use

```
kedro jupyter notebook
kedro run --runner=ParallelRunner --tags zoomin
```

You can also start JupyterLab:

```
kedro jupyter lab
```
## Kedro viz

### IPython
And if you want to run an IPython session:
Visualize the pipeline with

```
kedro ipython
kedro viz
```

### How to ignore notebook output cells in `git`
To automatically strip out all output cell contents before committing to `git`, you can use tools like [`nbstripout`](https://github.com/kynan/nbstripout). For example, you can add a hook in `.git/config` with `nbstripout --install`. This will run `nbstripout` before anything is committed to `git`.

> *Note:* Your output cells will be retained locally.
## Rules and guidelines

[Further information about using notebooks for experiments within Kedro projects](https://docs.kedro.org/en/develop/notebooks_and_ipython/kedro_and_notebooks.html).
In order to get the best out of the template:

* Don't remove any lines from the `.gitignore` file we provide
* Make sure your results can be reproduced by following a [data engineering convention](https://docs.kedro.org/en/stable/faq/faq.html#what-is-data-engineering-convention)
* Don't commit data to your repository
* Don't commit any credentials or your local configuration to your repository. Keep all your credentials and local configuration in `conf/local/`
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
18 changes: 18 additions & 0 deletions science/requirements.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
ipython>=8.10
jupyterlab>=3.0
kedro~=0.19.6
kedro-datasets>=3.0; python_version >= "3.9"
kedro-datasets>=1.0; python_version < "3.9"
kedro-datasets[netcdf, rioxarray]
kedro-telemetry>=0.3.1
kedro-viz>=6.7.0
notebook
pytest~=7.2
pytest-cov~=3.0
pytest-mock>=1.7.1, <2.0
ruff~=0.1.8
matplotlib
cartopy
scikit-image
pillow
opencv-python
Loading

0 comments on commit 4bc7e3d

Please sign in to comment.