Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

feature: add read write of table metadata #49

Merged
merged 3 commits into from
Aug 9, 2023
Merged

Conversation

sabinem
Copy link
Collaborator

@sabinem sabinem commented Aug 7, 2023

  • the metadata are stored in files metadata.csv that belong to the pipeline
  • these csv files contain the columns iten , type (table or field), language, description
  • the function rw_metadata_table prefills these files from the datasets, they can then be completed by the user
  • if the file already exists, it will be read in unless the user explictely choses to overwrite

Later the files can be collected by another fucntion and added as metadata table in postgres

@sabinem sabinem requested a review from cmdoret August 7, 2023 13:43
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this function encapsulates two distinct jobs:

  • Reading a [filled-in] metadata table
  • Generating a template for manual fill-in

The former is a one-time operation (run manually by the person producing the dataset), while the latter may be rerun anytime the postgres metadata table must be regenerated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a way to know what table a field belongs to

@sabinem
Copy link
Collaborator Author

sabinem commented Aug 8, 2023

Screenshot 2023-08-08 at 14 18 46

sabinem added 2 commits August 8, 2023 15:11
- the metadata are stored in files `metadata.csv` that belong to the pipeline
- these csv files contain the columns `iten` , `type` (table or field), `language`, `description`
- the function `rw_metadata_table` prefills these files from the datasets, they can then be completed by the user
- if the file already exists, it will be read in unless the user explictely choses to overwrite

Later the files can be collected by another fucntion and added as metadata table in postgres
even though for one pipeline the table name will be the same,
this will not hold when the metadata records are combined in
a general table in the postgres schema

the metadata table in postgres will contain the metadata for all
the datasets
@sabinem sabinem force-pushed the add-metadata-table branch from afac7e4 to 68a5b9a Compare August 8, 2023 13:11
adapt to what was discussed

there are now two metatdata tables and therefore also two files
- one for the table description
- another one for the table column descriptions and column data types

for BFS px cubes the BFS R package is used to get the table
description from the BFS metadata table title

the function was renamed
@sabinem
Copy link
Collaborator Author

sabinem commented Aug 9, 2023

Update: the output was adapted according to #50
Screenshot 2023-08-09 at 15 51 43

@sabinem sabinem requested a review from philbosch August 9, 2023 13:53
@cmdoret cmdoret self-requested a review August 9, 2023 14:20
Copy link
Collaborator

@cmdoret cmdoret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema of the table looks good !
One comment - I just realized there's one thing I'm not sure I understand: In metadata_table_columns$data_type, what determines whether we put text or numeric?

For example if I have a column pet_type with values dog and cat, I would assume it is categorical. If I have 120 categories, is it still categorical? This would for example happen in the criminal offence dataset, where the "offence" column has 260 different values.

@sabinem
Copy link
Collaborator Author

sabinem commented Aug 9, 2023

@cmdoret I think the distinction between categorical and numeric is generally very easy:

### Original datasets

  • all these tables have originally one data column with numeric values, thus numerical
  • all other columns are dimensions: we dismiss theyear and spatialunit, but everything else of these dimensions is always categorical

Modified datasets

  • when we use the wide table approach, we sometimes make additional numeric data columns, by splitting one categorical column into more columns with the column names then the categorial values: all these new columns have the original data values in them and are by that also numerical

Does this answer your question?

@cmdoret
Copy link
Collaborator

cmdoret commented Aug 9, 2023

Yes, thank you @sabinem !

@cmdoret cmdoret self-requested a review August 9, 2023 15:26
@sabinem sabinem merged commit fe014e4 into main Aug 9, 2023
@sabinem sabinem deleted the add-metadata-table branch September 18, 2023 08:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants