Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FYI: some haul_id values exceed integer precision of read/write csv in R #49

Open
afredston opened this issue Jun 3, 2024 · 0 comments

Comments

@afredston
Copy link
Contributor

afredston commented Jun 3, 2024

Spent a day down this rabbit hole and wanted to share:

In the FISHGLOB dataset, (at least some) values in the haul_id column are very long strings of numerics. These exceed the integer precision of R functions to read and write CSVs, I think both in base R (read.csv/write.csv) and readr (read_csv/write_csv). This means that if you write out and then read in a CSV with a haul_id column, the values will be wrong when you read it in, even if the column class was "character" when you wrote it out. (You can see my panicked SO question when I figured this out for a reprex.)

There is a simple solution to this which is to specify that the column should be treated as a character and not a numeric when the CSV is read, like so:

hauldat <- read_csv(here("data","haul_data.csv"), col_types = cols(haul_id = col_character())) 

And this problem does not occur if other data files (e.g., Rdata) are used. So unless we change the formatting of the haul IDs, which causes other issues, we should encourage FISHGLOB users who code in R to save data in other formats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant