Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

feat(A16): add pipeline #47

Merged
merged 6 commits into from
Aug 7, 2023
Merged

feat(A16): add pipeline #47

merged 6 commits into from
Aug 7, 2023

Conversation

cmdoret
Copy link
Collaborator

@cmdoret cmdoret commented Aug 2, 2023

Adds the pipeline for dataset A16 (#39): Demographic balance by canton

Some columns were dropped as they were uninformative:

  • Acquisition of swiss citizenship: always had a value of 0
  • change of population type: already accounted for in "immigration" and "emigration".
  • natural change: can easily be obtained using births - deaths

Observations prior to 1981 were discarded as they only contained a subset of variables, with others set to 0.

> dplyr::glimpse(ds$data)
Rows: 1,107
Columns: 12
$ year                             <chr> "1981", "1981", "1981", "1981", "1981…
$ total_population                 <dbl> 6335243, 1120815, 911016, 294421, 335…
$ births                           <dbl> 73747, 12325, 10599, 3747, 438, 1358,…
$ deaths                           <dbl> 59763, 10283, 8862, 2693, 291, 846, 2…
$ immigration                      <dbl> 121420, 23883, 11544, 4025, 421, 1265…
$ in_migration_from_another_canton <dbl> 134359, 17791, 13809, 5965, 477, 2719…
$ emigration                       <dbl> 97743, 19791, 10205, 2980, 361, 911, …
$ out_migration_to_another_canton  <dbl> 134359, 20900, 13148, 6107, 736, 2663…
$ net_migration                    <dbl> 23677, 983, 2000, 903, -199, 410, 293…
$ statistical_adjustment           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ population_change                <dbl> 37661, 3025, 3737, 1957, -52, 922, 46…
$ spatialunit_uid                  <chr> "0_CH", "1_A.ADM1", "2_A.ADM1", "3_A.…

image

@cmdoret cmdoret changed the title More feat(A16): add pipeline feat(A16): add pipeline Aug 2, 2023
@cmdoret cmdoret linked an issue Aug 2, 2023 that may be closed by this pull request
@cmdoret cmdoret self-assigned this Aug 2, 2023
@cmdoret cmdoret added the dataset Proposal for a new dataset label Aug 2, 2023
@cmdoret cmdoret requested a review from sabinem August 2, 2023 15:21
Copy link
Collaborator

@sabinem sabinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had trouble running the pipeline due to API limits.

Otherwise it looks fine to me.

Just my two comments about what I would leave in. But we can also merge and discuss this with @nooralahzadeh . Therefore I approve the PR.

scripts/A16.R Outdated
janitor::clean_names() %>%
dplyr::filter(
sex == "Sex - total" &
citizenship_category == "Citizenship (category) - total"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would leave the citizenship category in, since it relates to the other columns, where as gender does not:

For example the out migration and in migration: it is interesting whether Swiss citizens left the country or came into it from another country or whether it was other nationalities that left or came.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I will add it! thanks

scripts/A16.R Outdated
-change_of_population_type,
-population_on_31_december,
-natural_change,
-acquisition_of_swiss_citizenship
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave in acquisition_of_swiss_citizenship since this is interesting. Redundance is not an issue here, where you want to translate natural language question into sql. What ever term might resemble a natural language question should stay. Also you need to consider that the data in these tables is never complete.

Copy link
Collaborator Author

@cmdoret cmdoret Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, but the way this is coded is not intuitive.

I'll make sure to write queries that explicitely use it.

   year  canton                 acquisition_of_swiss_citi…¹ citizenship_category
   <chr> <chr>                                        <dbl> <chr>               
 1 1971  Aargau                                           0 Citizenship (catego…
 2 1971  Aargau                                         746 Switzerland         
 3 1971  Aargau                                        -746 Foreign country     
 4 1971  Appenzell Ausserrhoden                           0 Citizenship (catego…
 5 1971  Appenzell Ausserrhoden                          73 Switzerland         
 6 1971  Appenzell Ausserrhoden                         -73 Foreign country     
 7 1971  Appenzell Innerrhoden                            0 Citizenship (catego…
 8 1971  Appenzell Innerrhoden                           24 Switzerland         
 9 1971  Appenzell Innerrhoden                          -24 Foreign country     

Copy link
Collaborator

@sabinem sabinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Cyril, I just looked at your PR again: Can you please rebase this before you merge? It has 26 file changes for this pipeline. That seems to much from my perspective.

@cmdoret cmdoret merged commit 0b8715c into main Aug 7, 2023
@cmdoret cmdoret deleted the ds_a16 branch August 30, 2023 13:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
dataset Proposal for a new dataset
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New dataset: A16 Demographic balance by canton
2 participants