-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Still probems writing long strings to sav files #346
Comments
I hope I did that right. |
Could you please rework your reproducible example to use the reprex package ? That makes it easier to see both the input and the output, formatted in such a way that I can easily re-run in a local session. |
This seems fine to me currently: library(haven)
long <- paste0(rep(letters, 100), collapse = "")
df <- data.frame(x = long, stringsAsFactors = FALSE)
path <- tempfile()
write_sav(df, path)
df2 <- read_sav(path)
df$x == long
#> [1] TRUE Could you please try and create a reprex in that style? (i.e. generating the problematic data rather than downloading from elsewhere) |
Sorry, didn't realise a new issue had been opened. But basically this still seems to be #266. The problem is not round-tripping with haven, but that SPSS doesn't open the file. I think it's still the same problem you can also see with the minimal reprex. This is after installing the latest haven from master. You don't have SPSS to test, right? library(haven)
n <- 256
df <- data.frame(long = paste(rep("a", n), collapse = ""), stringsAsFactors = FALSE)
write_sav(df, path = "test.sav")
|
@rubenarslan thanks! It wasn't clear that this was the problem. Can you please confirm that you're using the latest development version of haven (i.e. you installed in the last 12 hours)? |
Yes!
|
@evanmiller looks like another one for you @rubenarslan it might be useful for you to create a |
I uploaded those files back in #266. Maybe just reopen the issue? |
Good idea - I'll clean up the discussion which got sidetrack there. |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
I am still having problems with this. I am sure I am using the development version of haven.
There are quite a few variables in this dataset that have more than 255 characters. The following script 1) downloads the file I am working with 2) writes it out via haven 3) subsets variables with less than 255 characters in the values 4) writes that data set out for comparison.
Note: it requires the package
dataverse
to load the file and sets a relevant system variable. Sorry about that;I can't figure out how to make this work without that.#setwd()
#If necessary, install dataverse
#install.packages('dataverse')
#Load dataverse
library(dataverse)
#Warning: This changes two environment variables that are necessary to search dataverse with the dataverse package; I haven't been able to write this script where these are only changed locally. It's not a huge deal, but be aware.
Sys.setenv("DATAVERSE_SERVER" = "dataverse.scholarsportal.info")
Sys.setenv("DATAVERSE_KEY" = "e66bfc71-7665-40bf-83c2-b7e5a6dc2c33")
#Get the problematic file as a binary file (I think)
out<-get_file('second_survey.tab', 'hdl:10864/10985', 'original')
#> Warning in strptime(x, fmt, tz = "GMT"): unknown timezone 'zone/tz/2018c.
#> 1.0/zoneinfo/America/Toronto'
#> Warning in strptime(x, fmt, tz = "GMT"): unknown timezone 'zone/tz/2018c.
#> 1.0/zoneinfo/America/Toronto'
#Write it out in SPSS format
writeBin(out, 'out.sav')
#AFAIK I have the development version of haven installed
library(haven)
#Read the sav file in
read_out<-read_sav(out)
#Now Write it out and test
write_sav(read_out, 'write_out.sav')
#Two variables that cause problems are
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.4.2
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
read_out %>%
select(contains('storetime'))
#> # A tibble: 5,309 x 2
#> storetime R_storetime
#>
#> 1 BrowserName_Capture:NaN|QINF0:1|INT… ""
#> 2 BrowserName_Capture:NaN|QINF0:1|INT… ""
#> 3 BrowserName_Capture:NaN|QINF0:3|INT… ""
#> 4 BrowserName_Capture:NaN|QINF0:2|INT… BrowserName_Capture:NaN|QINF0:2|I…
#> 5 "" BrowserName_Capture:NaN|QINF0:1|I…
#> 6 BrowserName_Capture:NaN|QINF0:1|INT… ""
#> 7 "" BrowserName_Capture:NaN|QINF0:2|I…
#> 8 BrowserName_Capture:NaN|QINF0:1|INT… ""
#> 9 BrowserName_Capture:NaN|QINF0:1|INT… ""
#> 10 BrowserName_Capture:NaN|QINF0:0|INT… BrowserName_Capture:NaN|QINF0:1|I…
#> # ... with 5,299 more rows
#When I delete all variables that have string values longer than 255 characters, the sav file that is produced is fine.
read_out %>%
select(which(apply(., 2, function(x) max(nchar(x, keepNA=F)))<255)) %>%
write_sav(., 'write_out_subset_less_than_255.sav')
The text was updated successfully, but these errors were encountered: