feature: add read write of table metadata #49

sabinem · 2023-08-07T13:43:25Z

the metadata are stored in files metadata.csv that belong to the pipeline
these csv files contain the columns iten , type (table or field), language, description
the function rw_metadata_table prefills these files from the datasets, they can then be completed by the user
if the file already exists, it will be read in unless the user explictely choses to overwrite

Later the files can be collected by another fucntion and added as metadata table in postgres

cmdoret · 2023-08-07T14:29:04Z

R/metadata_table.R

I feel like this function encapsulates two distinct jobs:

Reading a [filled-in] metadata table

Generating a template for manual fill-in

The former is a one-time operation (run manually by the person producing the dataset), while the latter may be rerun anytime the postgres metadata table must be regenerated.

cmdoret · 2023-08-07T14:29:45Z

pipelines/A6/metadata.csv

We should have a way to know what table a field belongs to

sabinem · 2023-08-08T12:19:04Z

- the metadata are stored in files `metadata.csv` that belong to the pipeline - these csv files contain the columns `iten` , `type` (table or field), `language`, `description` - the function `rw_metadata_table` prefills these files from the datasets, they can then be completed by the user - if the file already exists, it will be read in unless the user explictely choses to overwrite Later the files can be collected by another fucntion and added as metadata table in postgres

even though for one pipeline the table name will be the same, this will not hold when the metadata records are combined in a general table in the postgres schema the metadata table in postgres will contain the metadata for all the datasets

adapt to what was discussed there are now two metatdata tables and therefore also two files - one for the table description - another one for the table column descriptions and column data types for BFS px cubes the BFS R package is used to get the table description from the BFS metadata table title the function was renamed

sabinem · 2023-08-09T13:52:08Z

Update: the output was adapted according to #50

cmdoret

The schema of the table looks good !
One comment - I just realized there's one thing I'm not sure I understand: In metadata_table_columns$data_type, what determines whether we put text or numeric?

For example if I have a column pet_type with values dog and cat, I would assume it is categorical. If I have 120 categories, is it still categorical? This would for example happen in the criminal offence dataset, where the "offence" column has 260 different values.

sabinem · 2023-08-09T15:13:19Z

@cmdoret I think the distinction between categorical and numeric is generally very easy:

### Original datasets

all these tables have originally one data column with numeric values, thus numerical
all other columns are dimensions: we dismiss theyear and spatialunit, but everything else of these dimensions is always categorical

Modified datasets

when we use the wide table approach, we sometimes make additional numeric data columns, by splitting one categorical column into more columns with the column names then the categorial values: all these new columns have the original data values in them and are by that also numerical

Does this answer your question?

cmdoret · 2023-08-09T15:18:02Z

Yes, thank you @sabinem !

sabinem requested a review from cmdoret August 7, 2023 13:43

cmdoret reviewed Aug 7, 2023

View reviewed changes

sabinem added 2 commits August 8, 2023 15:11

sabinem force-pushed the add-metadata-table branch from afac7e4 to 68a5b9a Compare August 8, 2023 13:11

sabinem requested a review from philbosch August 9, 2023 13:53

cmdoret self-requested a review August 9, 2023 14:20

cmdoret reviewed Aug 9, 2023

View reviewed changes

cmdoret self-requested a review August 9, 2023 15:26

cmdoret approved these changes Aug 9, 2023

View reviewed changes

sabinem merged commit fe014e4 into main Aug 9, 2023

sabinem deleted the add-metadata-table branch September 18, 2023 08:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: add read write of table metadata #49

feature: add read write of table metadata #49

sabinem commented Aug 7, 2023

cmdoret Aug 7, 2023

cmdoret Aug 7, 2023

sabinem commented Aug 8, 2023

sabinem commented Aug 9, 2023

cmdoret left a comment •

edited

Loading

sabinem commented Aug 9, 2023 •

edited

Loading

cmdoret commented Aug 9, 2023

feature: add read write of table metadata #49

feature: add read write of table metadata #49

Conversation

sabinem commented Aug 7, 2023

cmdoret Aug 7, 2023

Choose a reason for hiding this comment

cmdoret Aug 7, 2023

Choose a reason for hiding this comment

sabinem commented Aug 8, 2023

sabinem commented Aug 9, 2023

cmdoret left a comment • edited Loading

Choose a reason for hiding this comment

sabinem commented Aug 9, 2023 • edited Loading

Modified datasets

cmdoret commented Aug 9, 2023

cmdoret left a comment •

edited

Loading

sabinem commented Aug 9, 2023 •

edited

Loading