Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Directory structure & metadata for parquet files #9

Open
mjcollin opened this issue Jun 16, 2017 · 1 comment
Open

Directory structure & metadata for parquet files #9

mjcollin opened this issue Jun 16, 2017 · 1 comment

Comments

@mjcollin
Copy link
Contributor

mjcollin commented Jun 16, 2017

Metadata is in #7 but if it's going to be in a separate file, then that needs to be considered here too.

The nested or faceted style of storage is really cool ("/guoda/data/gbif-idigbio.parquet/source=gbif") and intuitive to use. However coding mistakes like the example just there (it doesn't have the date on the end so it loads two copies of GBIF) can be exponentially disastrous as we add more sources and versions of data sets. Since we're including targeting intermediate programmers with this platform, do we want a safer syntax?

@mjcollin
Copy link
Contributor Author

A possible compromise: The archive version of data storage uses Jorrit's pattern and the latest is a copy of the latest at a fixed file name in the existing data folder - HDFS does not support symlinks so we'll have to have copes somewhere.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant