Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Embed metadata about when/how parquets were generated #7

Open
mjcollin opened this issue Feb 14, 2017 · 1 comment
Open

Embed metadata about when/how parquets were generated #7

mjcollin opened this issue Feb 14, 2017 · 1 comment

Comments

@mjcollin
Copy link
Contributor

mjcollin commented Feb 14, 2017

The parquets should have metadata about when they were generated, data source, citation, perhaps code version, perhaps machine, runtime, etc. so they can stand by themselves instead of relying on things like file name to indicate these things.

Also, an assigned UUID for a GUID too. (Handle?)

@mjcollin
Copy link
Contributor Author

mjcollin commented Jun 4, 2017

Looks like not yet, but there is an open ticket to be able to write parquet footer metadata but it has seen no progress in years:

https://stackoverflow.com/questions/42433111/how-to-add-extra-metadata-when-writing-to-parquet-files-using-spark

https://issues.apache.org/jira/browse/SPARK-10803

Possibly write a .metadata full of JSON when writing data sets but how would that interact with the long term plan of naming paths //

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant