Skip to content

Commit

Permalink
Merge pull request #1 from Vizzuality/add-science-project
Browse files Browse the repository at this point in the history
Add data pipeline with kedro
  • Loading branch information
BielStela authored Jul 26, 2024
2 parents ddb4d0e + 4bc7e3d commit e2ae6eb
Show file tree
Hide file tree
Showing 40 changed files with 268,970 additions and 0 deletions.
151 changes: 151 additions & 0 deletions science/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
##########################
# KEDRO PROJECT

# ignore all local configuration
conf/local/**
!conf/local/.gitkeep

# ignore potentially sensitive credentials files
conf/**/*credentials*

# ignore everything in the following folders
data/**

# except their sub-folders
!data/**/

# also keep all .gitkeep files
!.gitkeep

# keep also the example dataset
!data/01_raw/*


##########################
# Common files

# IntelliJ
.idea/
*.iml
out/
.idea_modules/

### macOS
*.DS_Store
.AppleDouble
.LSOverride
.Trashes

# Vim
*~
.*.swo
.*.swp

# emacs
*~
\#*\#
/.emacs.desktop
/.emacs.desktop.lock
*.elc

# JIRA plugin
atlassian-ide-plugin.xml

# C extensions
*.so

### Python template
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/

# Translations
*.mo
*.pot

# Django stuff:
*.log
.static_storage/
.media/
local_settings.py

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# mkdocs documentation
/site

# mypy
.mypy_cache/
17 changes: 17 additions & 0 deletions science/.pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.4
hooks:
- id: ruff
args: [ --fix ]
types_or: [ python, pyi, jupyter ]

- id: ruff-format
types_or: [ python, pyi, jupyter ]
1 change: 1 addition & 0 deletions science/.telemetry
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
consent: false
70 changes: 70 additions & 0 deletions science/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Digital Twins visual communication data processing

## Overview

This is your new Kedro project with Kedro-Viz setup, which was generated using `kedro 0.19.6`.

Take a look at the [Kedro documentation](https://docs.kedro.org) to get started.

## Pipelines

The project contains one pipeline for now: `globe`

### `lowvshigh`

Pipeline to generate the comparisson between low and high resolution simulations. Currently it has:

- splits nextgems global datasets into a set of tiffs (one per timestep) to use in blender to render a rotating globe.
- video generation pipeline for a regions defined in `conf/parameters.yml`


## How to install dependencies

Declare any dependencies in `requirements.txt` for `pip` installation.

To install them, run:

```
pip install -r requirements.txt
```

## How to run your Kedro pipeline

You can run your Kedro project with:

```
kedro run
```
I recomend use the `ParallelRunner` to run the nodes in parallel

```
kedro run --runner=ParallelRunner
```

### Run a subset of the pipeline

Kedro allows run subsets by selecting only nodes, pipelines or tags. Check the tags in the pipeline code or in kedro viz.
For example to run only the detailed videos pipelines use

```
kedro run --runner=ParallelRunner --tags zoomin
```


## Kedro viz

Visualize the pipeline with

```
kedro viz
```


## Rules and guidelines

In order to get the best out of the template:

* Don't remove any lines from the `.gitignore` file we provide
* Make sure your results can be reproduced by following a [data engineering convention](https://docs.kedro.org/en/stable/faq/faq.html#what-is-data-engineering-convention)
* Don't commit data to your repository
* Don't commit any credentials or your local configuration to your repository. Keep all your credentials and local configuration in `conf/local/`
20 changes: 20 additions & 0 deletions science/conf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# What is this for?

This folder should be used to store configuration files used by Kedro or by separate tools.

This file can be used to provide users with instructions for how to reproduce local configuration with their own credentials. You can edit the file however you like, but you may wish to retain the information below and add your own section in the section titled **Instructions**.

## Local configuration

The `local` folder should be used for configuration that is either user-specific (e.g. IDE configuration) or protected (e.g. security keys).

> *Note:* Please do not check in any local configuration to version control.
## Base configuration

The `base` folder is for shared configuration, such as non-sensitive and project-related configuration that may be shared across team members.

WARNING: Please do not put access credentials in the base configuration folder.

## Find out more
You can find out more about configuration from the [user guide documentation](https://docs.kedro.org/en/stable/configuration/configuration_basics.html).
106 changes: 106 additions & 0 deletions science/conf/base/catalog.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# ============== WINDSPEED ================

wind_speed_global_100km.raw:
type: kedro_datasets_experimental.netcdf.NetCDFDataset
filepath: data/01_raw/nextgems/ws_global_100km.nc

wind_speed_global_10km.raw:
type: kedro_datasets_experimental.netcdf.NetCDFDataset
filepath: data/01_raw/nextgems/ws_global_10km.nc

wind_speed_global_100km.parts:
type: partitions.PartitionedDataset
path: data/03_primary/ws-100-parts
dataset:
type: kedro_datasets_experimental.rioxarray.GeoTIFFDataset
save_args:
compress: zstd
filename_suffix: ".tif"

wind_speed_global_10km.parts:
type: partitions.PartitionedDataset
path: data/03_primary/ws-10-parts
dataset:
type: kedro_datasets_experimental.rioxarray.GeoTIFFDataset
save_args:
compress: zstd
filename_suffix: ".tif"




# ============== CLOUD COVER ================


cloud_cover_10km.raw:
type: kedro_datasets_experimental.netcdf.NetCDFDataset
filepath: data/01_raw/nextgems/lcc_global_10km.nc

cloud_cover_100km.raw:
type: kedro_datasets_experimental.netcdf.NetCDFDataset
filepath: data/01_raw/nextgems/lcc_global_100km.nc


cloud_cover_10km.parts:
type: partitions.PartitionedDataset
path: data/02_intermediate/amazonia-10-parts
dataset:
type: kedro_datasets_experimental.rioxarray.GeoTIFFDataset
save_args:
compress: zstd
filename_suffix: ".tif"

cloud_cover_100km.parts:
type: partitions.PartitionedDataset
path: data/02_intermediate/amazonia-100-parts
dataset:
type: kedro_datasets_experimental.rioxarray.GeoTIFFDataset
save_args:
compress: zstd
filename_suffix: ".tif"

cloud_cover_10km.video:
type: video.VideoDataset
filepath: data/03_primary/cloud_cover_10km.mp4

cloud_cover_100km.video:
type: video.VideoDataset
filepath: data/03_primary/cloud_cover_100km.mp4

# ============== PRECIPITATION ================


total_precipitation_10km.raw:
type: kedro_datasets_experimental.netcdf.NetCDFDataset
filepath: data/01_raw/nextgems/tp_global_10km.nc

total_precipitation_100km.raw:
type: kedro_datasets_experimental.netcdf.NetCDFDataset
filepath: data/01_raw/nextgems/tp_global_100km.nc

total_precipitation_10km.parts:
type: partitions.PartitionedDataset
path: data/02_intermediate/hurricane-10-parts
dataset:
type: kedro_datasets_experimental.rioxarray.GeoTIFFDataset
save_args:
compress: zstd
filename_suffix: ".tif"

total_precipitation_100km.parts:
type: partitions.PartitionedDataset
path: data/02_intermediate/hurricane-100-parts
dataset:
type: kedro_datasets_experimental.rioxarray.GeoTIFFDataset
save_args:
compress: zstd
filename_suffix: ".tif"


total_precipitation_10km.video:
type: video.VideoDataset
filepath: data/03_primary/tp_global_10km.mp4

total_precipitation_100km.video:
type: video.VideoDataset
filepath: data/03_primary/tp_global_100km.mp4
Loading

0 comments on commit e2ae6eb

Please sign in to comment.