-
Notifications
You must be signed in to change notification settings - Fork 1
feature: add read write of table metadata #49
Conversation
R/metadata_table.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this function encapsulates two distinct jobs:
- Reading a [filled-in] metadata table
- Generating a template for manual fill-in
The former is a one-time operation (run manually by the person producing the dataset), while the latter may be rerun anytime the postgres metadata table must be regenerated.
pipelines/A6/metadata.csv
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a way to know what table
a field
belongs to
- the metadata are stored in files `metadata.csv` that belong to the pipeline - these csv files contain the columns `iten` , `type` (table or field), `language`, `description` - the function `rw_metadata_table` prefills these files from the datasets, they can then be completed by the user - if the file already exists, it will be read in unless the user explictely choses to overwrite Later the files can be collected by another fucntion and added as metadata table in postgres
even though for one pipeline the table name will be the same, this will not hold when the metadata records are combined in a general table in the postgres schema the metadata table in postgres will contain the metadata for all the datasets
afac7e4
to
68a5b9a
Compare
adapt to what was discussed there are now two metatdata tables and therefore also two files - one for the table description - another one for the table column descriptions and column data types for BFS px cubes the BFS R package is used to get the table description from the BFS metadata table title the function was renamed
Update: the output was adapted according to #50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema of the table looks good !
One comment - I just realized there's one thing I'm not sure I understand: In metadata_table_columns$data_type
, what determines whether we put text
or numeric
?
For example if I have a column pet_type
with values dog
and cat
, I would assume it is categorical. If I have 120 categories, is it still categorical? This would for example happen in the criminal offence dataset, where the "offence" column has 260 different values.
@cmdoret I think the distinction between categorical and numeric is generally very easy: ### Original datasets
Modified datasets
Does this answer your question? |
Yes, thank you @sabinem ! |
metadata.csv
that belong to the pipelineiten
,type
(table or field),language
,description
rw_metadata_table
prefills these files from the datasets, they can then be completed by the userLater the files can be collected by another fucntion and added as metadata table in postgres