Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import issue (due to Sys.setlocale) #24

Open
LukasWallrich opened this issue Dec 19, 2022 · 1 comment
Open

Import issue (due to Sys.setlocale) #24

LukasWallrich opened this issue Dec 19, 2022 · 1 comment

Comments

@LukasWallrich
Copy link

If I import the following RIS file with read_ref(), authors are returned as the first column - which then breaks write_refs(), where authors are written first, so that re-import (at least with read_ref) fails.

library(synthesisr)

tmp <- tempfile()

download.file("https://raw.githubusercontent.com/ESHackathon/CiteSource/main/tests/testthat/data/final.ris", tmp)

citations <- read_ref(tmp, return_df = TRUE)
write_refs(citations, file = "test-export.ris")
citations <- read_ref("test-export.ris", return_df = TRUE)
#> Error in data.frame(start = which(z_dframe$ris == start_tag), end = end_rows): arguments imply differing number of rows: 101, 17

After quite a lot of troubleshooting, I realized that this is because of Sys.setlocale("LC_ALL", "C") - if that is set, the first characters of the file are read as \357 \273 \277 so that the TY in that row is no longer recognised. Given that this breaks everything, I wonder whether it would be worth stripping special characters there? Or not using Sys.setlocale at all ... not sure why it is needed and thus if there is a safer workaround.

Also, as it currently stands, the function silently changes Sys.setlocate if it has been customised before - which is not good practice (arguably against CRAN's guidance not to modify the global environment). So the following might be better - though it needs to be set for each require locale type separately?

   old_loc <- Sys.getlocale("LC_CTYPE")
   invisible(Sys.setlocale("LC_CTYPE", "C"))
    on.exit(invisible(Sys.setlocale("LC_CTYPE", old_loc)))
@LukasWallrich
Copy link
Author

Just to add - this issue arises whenever the first line in the .ris file is not recognised / is not TY ... then the data frame order is different from the usual, and any export goes awry. Would it be worth either moving 'type' to the front or issuing a warning/error when the first line is not what is expected?

mjwestgate added a commit that referenced this issue Jul 9, 2023
- use `vroom` for imports to allow `locale` arg (instead of `Sys.setlocale()`, see #24 )
- refactor `parse_bibtex()` to use `unglue()` for > brevity and readability
- add `as_tibble()`, for class `bibliography`
- make `add_line_breaks()` backwards-compatible
mjwestgate added a commit that referenced this issue Feb 9, 2024
- seperate `parse_` functions for clarity
- ensure read_refs() returns a `tibble` when type = "ris"
- add support and tests for read/write roundtripping (#24)
- ensure tests pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant