You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I import the following RIS file with read_ref(), authors are returned as the first column - which then breaks write_refs(), where authors are written first, so that re-import (at least with read_ref) fails.
library(synthesisr)
tmp<- tempfile()
download.file("https://raw.githubusercontent.com/ESHackathon/CiteSource/main/tests/testthat/data/final.ris", tmp)
citations<- read_ref(tmp, return_df=TRUE)
write_refs(citations, file="test-export.ris")
citations<- read_ref("test-export.ris", return_df=TRUE)
#> Error in data.frame(start = which(z_dframe$ris == start_tag), end = end_rows): arguments imply differing number of rows: 101, 17
After quite a lot of troubleshooting, I realized that this is because of Sys.setlocale("LC_ALL", "C") - if that is set, the first characters of the file are read as \357 \273 \277 so that the TY in that row is no longer recognised. Given that this breaks everything, I wonder whether it would be worth stripping special characters there? Or not using Sys.setlocale at all ... not sure why it is needed and thus if there is a safer workaround.
Also, as it currently stands, the function silently changes Sys.setlocate if it has been customised before - which is not good practice (arguably against CRAN's guidance not to modify the global environment). So the following might be better - though it needs to be set for each require locale type separately?
Just to add - this issue arises whenever the first line in the .ris file is not recognised / is not TY ... then the data frame order is different from the usual, and any export goes awry. Would it be worth either moving 'type' to the front or issuing a warning/error when the first line is not what is expected?
- use `vroom` for imports to allow `locale` arg (instead of `Sys.setlocale()`, see #24 )
- refactor `parse_bibtex()` to use `unglue()` for > brevity and readability
- add `as_tibble()`, for class `bibliography`
- make `add_line_breaks()` backwards-compatible
- seperate `parse_` functions for clarity
- ensure read_refs() returns a `tibble` when type = "ris"
- add support and tests for read/write roundtripping (#24)
- ensure tests pass
If I import the following RIS file with
read_ref()
, authors are returned as the first column - which then breakswrite_refs()
, where authors are written first, so that re-import (at least with read_ref) fails.After quite a lot of troubleshooting, I realized that this is because of
Sys.setlocale("LC_ALL", "C")
- if that is set, the first characters of the file are read as\357 \273 \277
so that the TY in that row is no longer recognised. Given that this breaks everything, I wonder whether it would be worth stripping special characters there? Or not using Sys.setlocale at all ... not sure why it is needed and thus if there is a safer workaround.Also, as it currently stands, the function silently changes
Sys.setlocate
if it has been customised before - which is not good practice (arguably against CRAN's guidance not to modify the global environment). So the following might be better - though it needs to be set for each require locale type separately?The text was updated successfully, but these errors were encountered: