Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_municipalities argument codes_as_character seems to not work #46

Open
sampoves opened this issue Jun 29, 2023 · 7 comments
Open

get_municipalities argument codes_as_character seems to not work #46

sampoves opened this issue Jun 29, 2023 · 7 comments

Comments

@sampoves
Copy link
Contributor

Hello,

It would seem to me that the argument codes_as_character for function get_municipalities does not work in geofi_1.0.9:

codes_as_character is FALSE

> muns1 <- geofi::get_municipalities(codes_as_character = FALSE) %>% 
+   dplyr::select(kunta)
> 
Requesting response from: http://geo.stat.fi/geoserver/wfs?service=WFS&version=1.0.0&request=getFeature&typename=tilastointialueet%3Akunta4500k_2023
Data is licensed under: Attribution 4.0 International (CC BY 4.0)
Warning message:
Coercing CRS to epsg:3067 (ETRS89 / TM35FIN) 
> 
> muns1
Simple feature collection with 309 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 83747.59 ymin: 6637032 xmax: 732907.7 ymax: 7776431
Projected CRS: ETRS89 / TM35FIN(E,N)
First 10 features:
   kunta                           geom
1      5 MULTIPOLYGON (((366787.9 70...
2      9 MULTIPOLYGON (((382543.4 71...
3     10 MULTIPOLYGON (((343298.2 69...
4     16 MULTIPOLYGON (((436139.7 67...
5     18 MULTIPOLYGON (((426631 6720...
6     19 MULTIPOLYGON (((263938.3 67...
7     20 MULTIPOLYGON (((328844.1 67...
8     35 MULTIPOLYGON (((176190.4 67...
9     43 MULTIPOLYGON (((92735.28 67...
10    46 MULTIPOLYGON (((600317.4 69...
> 
> sapply(muns1, class)
$kunta
[1] "integer"

$geom
[1] "sfc_MULTIPOLYGON" "sfc"

codes_as_character is TRUE

> muns2 <- geofi::get_municipalities(year = 2022, codes_as_character = TRUE) %>% 
+   dplyr::select(kunta)
Requesting response from: http://geo.stat.fi/geoserver/wfs?service=WFS&version=1.0.0&request=getFeature&typename=tilastointialueet%3Akunta4500k_2022
Data is licensed under: Attribution 4.0 International (CC BY 4.0)
Warning message:
Coercing CRS to epsg:3067 (ETRS89 / TM35FIN) 
> 
> muns2
Simple feature collection with 309 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 83747.59 ymin: 6637032 xmax: 732907.7 ymax: 7776431
Projected CRS: ETRS89 / TM35FIN(E,N)
First 10 features:
   kunta                           geom
1      5 MULTIPOLYGON (((366787.9 70...
2      9 MULTIPOLYGON (((382543.4 71...
3     10 MULTIPOLYGON (((343298.2 69...
4     16 MULTIPOLYGON (((436139.7 67...
5     18 MULTIPOLYGON (((426631 6720...
6     19 MULTIPOLYGON (((263938.3 67...
7     20 MULTIPOLYGON (((328844.1 67...
8     35 MULTIPOLYGON (((176190.4 67...
9     43 MULTIPOLYGON (((92735.28 67...
10    46 MULTIPOLYGON (((600317.4 69...
> 
> sapply(muns2, class)
$kunta
[1] "integer"

$geom
[1] "sfc_MULTIPOLYGON" "sfc"

Changing the argument value does not introduce leading zeroes to the column kunta and it does not change the column type to character.

My session:

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=Finnish_Finland.utf8  LC_CTYPE=Finnish_Finland.utf8    LC_MONETARY=Finnish_Finland.utf8 LC_NUMERIC=C                    
[5] LC_TIME=Finnish_Finland.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] geofi_1.0.9    stringr_1.5.0  stringi_1.7.12 readxl_1.4.2   dplyr_1.1.2   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10        cellranger_1.1.0   pillar_1.9.0       compiler_4.2.2     class_7.3-20       tools_4.2.2        odbc_1.3.4        
 [8] digest_0.6.31      bit_4.0.5          lifecycle_1.0.3    tibble_3.2.1       pkgconfig_2.0.3    rlang_1.1.0        DBI_1.1.3         
[15] cli_3.6.0          writexl_1.4.2      curl_5.0.0         yaml_2.3.7         e1071_1.7-12       withr_2.5.0        httr_1.4.5        
[22] xml2_1.3.3         generics_0.1.3     vctrs_0.6.2        hms_1.1.2          classInt_0.4-9     bit64_4.0.5        grid_4.2.2        
[29] tidyselect_1.2.0   glue_1.6.2         sf_1.0-12          R6_2.5.1           fansi_1.0.4        purrr_1.0.1        blob_1.2.3        
[36] magrittr_2.0.3     ellipsis_0.3.2     units_0.8-1        httpcache_1.2.0    utf8_1.2.2         KernSmooth_2.23-20 proxy_0.4-27

Additionally, what's peculiar is that the command geofi::get_municipalities(codes_as_character = FALSE) works without any specific year, but codes_as_character = TRUE requires an explicit year argument: geofi::get_municipalities(year = 2022, codes_as_character = TRUE). This is obviously a separate matter, will be opening an issue for it too if I find the time.

@pitkant
Copy link
Member

pitkant commented Jun 29, 2023

I was the one fixing issue #38 with PR #39, so IIRC kunta column is used for some join operations and that's why it needs to be in integer format. What codes_as_character = TRUE affects are the different *_code columns, such as municipality_code in this case.

Maybe this could be fixed by either

  1. coercing the municipality codes from the other source (MML?) from which data is joined to character format, or
  2. hiding the kunta column somewhere else than as the 2nd column of the Simple feature collection.

What does @muuankarski think?

pitkant added a commit that referenced this issue Jun 29, 2023
@pitkant
Copy link
Member

pitkant commented Jun 29, 2023

Additionally, what's peculiar is that the command geofi::get_municipalities(codes_as_character = FALSE) works without any specific year, but codes_as_character = TRUE requires an explicit year argument: geofi::get_municipalities(year = 2022, codes_as_character = TRUE). This is obviously a separate matter, will be opening an issue for it too if I find the time.

This was actually because of missing sairaanhoitop_code column in the most recent year, 2023, that the function defaults to if year is not provided explicitly. I wrote a patch that is more robust for differences in column names between years.

@pitkant pitkant mentioned this issue Jun 29, 2023
@sampoves
Copy link
Contributor Author

Thank you @pitkant for your fast reply. I have to admit that I was not aware of municipality_code due to RStudio playing tricks on me (not showing any columns past id 52 🤨).

municipality_code is already working the way I would expect in geofi_1.0.9. Nevertheless, many thanks for looking into this matter!

muuankarski added a commit that referenced this issue Oct 31, 2023
@sampoves
Copy link
Contributor Author

sampoves commented Nov 2, 2023

Hello,

Unfortunately codes_as_character still does not work in geofi_10.0.10. I am sorry to bring this up again, and as always, thank you geofi maintainers!

> geofi::get_municipalities(codes_as_character = TRUE) %>% 
+   dplyr::select(kunta)

Requesting response from: http://geo.stat.fi/geoserver/wfs?service=WFS&version=1.0.0&request=getFeature&typename=tilastointialueet%3Akunta4500k_2023
Data is licensed under: Attribution 4.0 International (CC BY 4.0)
Simple feature collection with 309 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 83747.59 ymin: 6637032 xmax: 732907.7 ymax: 7776431
Projected CRS: ETRS89 / TM35FIN(E,N)
First 10 features:
   kunta                           geom
1      5 MULTIPOLYGON (((366787.9 70...
2      9 MULTIPOLYGON (((382543.4 71...
3     10 MULTIPOLYGON (((343298.2 69...
4     16 MULTIPOLYGON (((436139.7 67...
5     18 MULTIPOLYGON (((426631 6720...
6     19 MULTIPOLYGON (((263938.3 67...
7     20 MULTIPOLYGON (((328844.1 67...
8     35 MULTIPOLYGON (((176190.4 67...
9     43 MULTIPOLYGON (((92735.28 67...
10    46 MULTIPOLYGON (((600317.4 69...
Warning message:
Coercing CRS to epsg:3067 (ETRS89 / TM35FIN) 

Requesting response from: http://geo.stat.fi/geoserver/wfs?service=WFS&version=1.0.0&request=getFeature&typename=tilastointialueet%3Akunta4500k_2023
Data is licensed under: Attribution 4.0 International (CC BY 4.0)
Warning message:
Coercing CRS to epsg:3067 (ETRS89 / TM35FIN) 

> sapply(muns, class)
$kunta
[1] "integer"

$geom
[1] "sfc_MULTIPOLYGON" "sfc"

Current environment is this:

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default


locale:
[1] LC_COLLATE=Finnish_Finland.utf8  LC_CTYPE=Finnish_Finland.utf8    LC_MONETARY=Finnish_Finland.utf8
[4] LC_NUMERIC=C                     LC_TIME=Finnish_Finland.utf8    

time zone: Europe/Helsinki
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] geofi_1.0.10   sf_1.0-14      odbc_1.3.5     stringr_1.5.0  stringi_1.7.12 readxl_1.4.3  
[7] dplyr_1.1.2   

loaded via a namespace (and not attached):
 [1] bit_4.0.5          compiler_4.3.1     tidyselect_1.2.0   Rcpp_1.0.11        xml2_1.3.5        
 [6] blob_1.2.4         yaml_2.3.7         R6_2.5.1           generics_0.1.3     curl_5.0.2        
[11] classInt_0.4-10    tibble_3.2.1       units_0.8-3        DBI_1.1.3          pillar_1.9.0      
[16] rlang_1.1.1        utf8_1.2.3         bit64_4.0.5        cli_3.6.1          withr_2.5.0       
[21] magrittr_2.0.3     class_7.3-22       digest_0.6.33      grid_4.3.1         httpcache_1.2.0   
[26] rstudioapi_0.15.0  hms_1.1.3          lifecycle_1.0.3    vctrs_0.6.3        writexl_1.4.2     
[31] KernSmooth_2.23-21 proxy_0.4-27       glue_1.6.2         cellranger_1.1.0   fansi_1.0.4       
[36] e1071_1.7-13       purrr_1.0.2        httr_1.4.7         tools_4.3.1        pkgconfig_2.0.3  

@muuankarski
Copy link
Collaborator

muuankarski commented Nov 2, 2023 via email

@pitkant
Copy link
Member

pitkant commented Nov 3, 2023

@sampoves Yes, it would seem that what codes_as_characters actually does is change only certain fields that are meant to be characters into characters, but not all fields. See PR #39 for a list of which codes are width 3 characters, which are width 2 characters and which are integers. From the commit history you can notice that I thought that kunta can also be a character but apparently it breaks some join operations somewhere in the package or in package vignettes. Therefore it was changed back to be an integer in all cases. Maybe @muuankarski can shed light on this.

You could use municipality_code in join operations instead of kunta (although make sure that codes_as_characters = TRUE, if codes_as_characters = FALSE then municipality_code will also be integers)

Also argument documentation could be clarified, or then it could be made so that all codes would indeed be characters?

@sampoves
Copy link
Contributor Author

sampoves commented Nov 6, 2023

Hello @muuankarski and @pitkant, many thanks for your replies.

We've been through this and I do have to admit that I just didn't remember what we've been conversing about. Apologies and thank you for the friendly reminder about the fields and as you said municipality_code is in fact characters when codes_as_characters = TRUE.

I do think it is tiny bit confusing to have a prominent field kunta which does not change in any circumstance, and the fields that are in fact affected are buried well into the wide output dataframe. I think argument clarification in the documentation would be a helpful step forward in the matter.

Many thanks for the swift communication and maintenance of geofi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants