CatalogBuilder

The CatalogBuilder API will collect building blocks necessary to build a data catalog which can then be ingested in climate analysis scripts/workflow, leveraging the use of intake-esm and xarray.

Tested on posix file system, S3 and GFDL post-processed (select simulations, components) at this time. This repository has unit tests (pytest) and incorporated the same in GitHub Actions, when a PR is open or a push is initiated.

Two ways To use intakebuilder

[A] 1.setup environment using environment_intake.yml 2. Clone https://github.com/aradhakrishnanGFDL/CatalogBuilder 3. From CatalogBuilder directory, you can import the package. 4. Test: import intakebuilder 5. See examples in examples directory, gen_intake_gfdl.py, gen_intake_local.py runs on UDA , gen_intake_s3.py runs on S3 bucket

[B]

1.setup conda environment using environment_intake.yml conda env create -f environment_intake.yml 2. conda activate intake 3. pip install git+https://github.com/aradhakrishnanGFDL/CatalogBuilder.git 4. Test: import intakebuilder 5. See examples in examples directory, gen_intake_local.py runs on UDA , gen_intake_s3.py runs on S3 bucket

[C]

This is specifically for GFDL workstations. Use branch testgfdl

From GFDL workstations.

module load conda
Install your own conda environment by using the environment_intake.yml in the repo OR use the conda environment here by doing conda activate /nbhome/a1r/miniconda3/envs/intake
Checkout the builderconfig.py for input output specifications and edit as needed. Edit the builderconfig.py to add more columns to csv
Run the gen_intake_gfdl.py in the cloned repo and checkout $the csv file generated
[Bonus] use intake-esm package to load the csv/json catalog. Refer to examples in this repo under notebooks

[D]

To work with GFDL PP data, please use the following instructions as we refine the instructions and code base to be more unified.

module load conda
Install your own conda environment by using the environment_intake.yml in the repo OR use the conda environment here by doing
conda env create -f environment_intake.yml conda activate [new_env]

You can also activate this environment directly for testing: conda activate /nbhome/a1r/miniconda3/envs/intake
git clone this repository.
Run the gen_intake_gfdl.py in the cloned repo with the PP directory as the first argument and the name of the output file as the second.

The following example generates a sample catalog called ‘output.csv’ to the home directory: /gen_intake_gfdl.py /archive/am5/am5/am5f3b1r0/c96L65_am5f3b1r0_pdclim1850F/gfdl.ncrc5-deploy-prod-openmp/pp /archive/am5/am5/am5f3b1r0/c96L65_am5f3b1r0_pdclim1850F/gfdl.ncrc5-deploy-prod-openmp/pp/output

The above creates an output.csv and an output.json.

Additional flags

In order to overwrite any existing CSV file at the given output path, the '--overwrite' flag can be used. In order to append any existing CSV file at the given output path, the '--append' flag can be used.

The catalog builder wrapper accepts two arguments: an input path where data is stored and an output path to which the final CSV file will be generated.

NOTE: Currently only time series output files are included in output CSV.

Name		Name	Last commit message	Last commit date
Latest commit History 426 Commits
.github/workflows		.github/workflows
cats		cats
examples		examples
intakebuilder		intakebuilder
notebooks		notebooks
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
README.md		README.md
TestInstructions		TestInstructions
entrypoint.sh		entrypoint.sh
environment.yml		environment.yml
gen_intake_gfdl.py		gen_intake_gfdl.py
gen_intake_local.py		gen_intake_local.py
gen_intake_s3.py		gen_intake_s3.py
intake_uda.csv.gz		intake_uda.csv.gz
meta.yaml		meta.yaml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CatalogBuilder

About

Releases

Packages

Languages

Ciheim/CatalogBuilder

Folders and files

Latest commit

History

Repository files navigation

CatalogBuilder

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages