Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

function saveXML error #6

Open
emiliesecherre opened this issue Apr 10, 2020 · 12 comments
Open

function saveXML error #6

emiliesecherre opened this issue Apr 10, 2020 · 12 comments

Comments

@emiliesecherre
Copy link

emiliesecherre commented Apr 10, 2020

Hello,
When trying to get Biopax file of the pathway R-HSA-1369062 i got this error :

`

idx <- which(grepl("reactome", simplifiedSearchResultsDf$uri) & grepl("ABC transporters",simplifiedSearchResultsDf$name, ignore.case = TRUE))
idx
[1] 1
uri <- simplifiedSearchResultsDf$uri[idx]
uri
[1] http://identifiers.org/reactome/R-HSA-1369062
100 Levels: http://identifiers.org/reactome/R-HSA-1369062 ... http://identifiers.org/reactome/R-HSA-165054
saveXML(getPc(uri, format = "BIOPAX"), biopaxFile)
Input is not proper UTF-8, indicate encoding !
Bytes: 0xE9 0x3C 0x2F 0x62
`

@cannin
Copy link
Collaborator

cannin commented Apr 10, 2020

Thanks for the reproducible bug report. It seems to be working for me. Here is what I ran. What versions of paxtoolsr and XML are you running?

> library(paxtoolsr)
Loading required package: rJava
Loading required package: XML
Consider citing this package: Luna A, et al. PaxtoolsR: pathway analysis in R using Pathway Commons. PMID: 26685306; citation("paxtoolsr")
> library(XML)
> 
> uri <- "http://identifiers.org/reactome/R-HSA-1369062"
> xml <- getPc(uri, format = "BIOPAX")
> 
> saveXML(xml, "del.xml")
[1] "del.xml"
> 
> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.3

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] paxtoolsr_1.21.1 XML_3.98-1.20    rJava_0.9-11    

loaded via a namespace (and not attached):
 [1] igraph_1.2.4.2       Rcpp_1.0.3           rstudioapi_0.10      magrittr_1.5        
 [5] knitr_1.26           hms_0.5.3            rjson_0.2.20         R6_2.4.1            
 [9] rlang_0.4.2          plyr_1.8.5           httr_1.4.1.9000      tools_3.6.2         
[13] xfun_0.11            R.oo_1.23.0          htmltools_0.4.0.9002 yaml_2.2.0          
[17] digest_0.6.23        tibble_2.1.3         crayon_1.3.4         readr_1.3.1         
[21] vctrs_0.2.1          R.utils_2.9.2        curl_4.3             zeallot_0.1.0       
[25] evaluate_0.14        rmarkdown_2.0        compiler_3.6.2       pillar_1.4.3        
[29] backports_1.1.5      R.methodsS3_1.7.1    jsonlite_1.6.9000    pkgconfig_2.0.3  

@emiliesecherre
Copy link
Author

emiliesecherre commented Apr 10, 2020

I have this :
`
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C LC_TIME=French_France.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] dplyr_0.8.5 MPINet_1.0 mgcv_1.8-31 nlme_3.1-145
[5] BiasedUrn_1.07 lilikoi_0.1.0 pathfindR_1.4.2 stringr_1.4.0
[9] gprofiler2_0.1.8 plyr_1.8.6 paxtoolsr_1.20.0 XML_3.99-0.3
[13] rJava_0.9-12 clusterProfiler_3.14.3 rWikiPathways_1.6.1 graphite_1.32.0

loaded via a namespace (and not attached):
[1] backports_1.1.6 Hmisc_4.4-0 fastmatch_1.1-0 corrplot_0.84 igraph_1.2.5
[6] lazyeval_0.2.2 splines_3.6.3 BiocParallel_1.20.1 ggplot2_3.3.0 urltools_1.7.3
[11] digest_0.6.25 foreach_1.5.0 htmltools_0.4.0 GOSemSim_2.12.1 viridis_0.5.1
[16] GO.db_3.10.0 fansi_0.4.1 magrittr_1.5 checkmate_2.0.0 memoise_1.1.0
[21] cluster_2.1.0 doParallel_1.0.15 recipes_0.1.10 readr_1.3.1 graphlayouts_0.6.0
[26] gower_0.2.1 R.utils_2.9.2 enrichplot_1.6.1 prettyunits_1.1.1 jpeg_0.1-8.1
[31] princurve_2.1.4 colorspace_1.4-1 blob_1.2.1 rappdirs_0.3.1 ggrepel_0.8.2
[36] xfun_0.12 crayon_1.3.4 RCurl_1.98-1.1 RWeka_0.4-42 jsonlite_1.6.1
[41] graph_1.64.0 survival_3.1-12 iterators_1.0.12 glue_1.4.0 polyclip_1.10-0
[46] gtable_0.3.0 ipred_0.9-9 BiocGenerics_0.32.0 scales_1.1.0 DOSE_3.12.0
[51] infotheo_1.2.0 DBI_1.1.0 Rcpp_1.0.4 htmlTable_1.13.3 viridisLite_0.3.0
[56] progress_1.2.2 gridGraphics_0.5-0 foreign_0.8-76 bit_1.1-15.2 europepmc_0.3
[61] Formula_1.2-3 lava_1.6.7 prodlim_2019.11.13 stats4_3.6.3 htmlwidgets_1.5.1
[66] httr_1.4.1 fgsea_1.12.0 RColorBrewer_1.1-2 acepack_1.4.1 ellipsis_0.3.0
[71] pkgconfig_2.0.3 R.methodsS3_1.8.0 farver_2.0.3 nnet_7.3-13 utf8_1.1.4
[76] caret_6.0-86 ggplotify_0.0.5 tidyselect_1.0.0 rlang_0.4.5 reshape2_1.4.4
[81] AnnotationDbi_1.48.0 munsell_0.5.0 tools_3.6.3 cli_2.0.2 generics_0.0.2
[86] RSQLite_2.2.0 ggridges_0.5.2 evaluate_0.14 ModelMetrics_1.2.2.2 knitr_1.28
[91] bit64_0.9-7 tidygraph_1.1.2 caTools_1.18.0 purrr_0.3.3 ggraph_2.0.2
[96] R.oo_1.23.0 DO.db_2.9 xml2_1.3.1 compiler_3.6.3 rstudioapi_0.11
[101] png_0.1-7 plotly_4.9.2.1 curl_4.3 tibble_3.0.0 tweenr_1.0.1
[106] stringi_1.4.6 lattice_0.20-41 Matrix_1.2-18 gbm_2.1.5 RWekajars_3.9.3-2
[111] vctrs_0.2.4 pillar_1.4.3 lifecycle_0.2.0 BiocManager_1.30.10 triebeard_0.3.0
[116] data.table_1.12.8 cowplot_1.0.0 bitops_1.0-6 qvalue_2.18.0 latticeExtra_0.6-29
[121] R6_2.4.1 gridExtra_2.3 IRanges_2.20.2 codetools_0.2-16 MASS_7.3-51.5
[126] assertthat_0.2.1 rjson_0.2.20 withr_2.1.2 S4Vectors_0.24.3 parallel_3.6.3
[131] hms_0.5.3 grid_3.6.3 rpart_4.1-15 timeDate_3043.102 tidyr_1.0.2
[136] class_7.3-16 rmarkdown_2.1 rvcheck_0.1.8 pROC_1.16.2 ggforce_0.3.1
[141] base64enc_0.1-3 lubridate_1.7.8 Biobase_2.46.0
`

When i try your code i get this :
`

uri <- "http://identifiers.org/reactome/R-HSA-1369062"
xml <- getPc(uri, format = "BIOPAX")
Input is not proper UTF-8, indicate encoding !
Bytes: 0xE9 0x3C 0x2F 0x62
Erreur : 1: Input is not proper UTF-8, indicate encoding !
Bytes: 0xE9 0x3C 0x2F 0x62
saveXML(xml, "del.xml")
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘saveXML’ for signature ‘"function"’
`

@emiliesecherre
Copy link
Author

emiliesecherre commented Apr 10, 2020

I also noticed that this error appear when i try to get BioPax from Reactome and SMPDB, it works with Panther and PathwaysCommon

@cannin
Copy link
Collaborator

cannin commented Apr 10, 2020

I updated my XML package (XML_3.99-0.3) without new problems and I don't think my small update to paxtoolsr would have affected XML. This is a strange error because getPc() should give you an XML package object so saving it shouldn't be a problem. Can you do any of the above to try to see what is in the variable returned from getPc?

library(paxtoolsr)
library(XML)
library(xml2)

uri <- "http://identifiers.org/reactome/R-HSA-1369062"
xml <- getPc(uri, format = "BIOPAX")
saveXML(xml, "del.xml")
str(xml)

tmp <- XML::toString.XMLNode(xml)
writeLines(as.character(tmp), "del_xml.txt")

s <- xml2::read_xml(tmp)
writeLines(as.character(s), "del_xml2.txt")

sessionInfo()

@emiliesecherre
Copy link
Author

The issue is the same, as i have an error with getPc it doesn't create the xml variable, so the rest of the code doesn't work either..

@cannin
Copy link
Collaborator

cannin commented Apr 10, 2020

What about this? The package functions do various routine things to generate the link below and then read in the XML. Below we get to the main thing that happens.

library(httr)

req <- GET('http://www.pathwaycommons.org/pc2/get?uri=http%3A%2F%2Fidentifiers.org%2Freactome%2FR-HSA-1369062&format=BIOPAX')
text <- content(req, "text")
str(text)
writeLines(text, "del.txt")

@emiliesecherre
Copy link
Author

It worked ! I think i understood where the issue was, the biopax file contained a lot of european chars (à,é,è) so i had to remove them to make the function toSif work. So I just have to change the Reactome identifier in the url of the GET function if i want to do some analysis with an other pathway ?

@cannin
Copy link
Collaborator

cannin commented Apr 10, 2020

Short answer: Okay. Yes, you can change the URL and only use the parts of paxtoolsr that you need. Ultimately, paxtoolsr functions are a set of "opinions" and "helpers" for how to best use the underlying API that you are accessing with that URL within R.

Longer answer: Can you paste the "bad" XML you get as a Gist (https://gist.github.com/)? There might be a simple solution for Windows, but I don't have easy access to a Windows machine. It might be as easy as an additional parameter I need to add to one function.

@emiliesecherre
Copy link
Author

emiliesecherre commented Apr 14, 2020

Hello,
I would be glad to but i'm really not awared about how gist work, i send you here the drive link :
https://drive.google.com/file/d/1wOoqNXrZyZTzQ_3mI2YDQSwUO4MiZTME/view?usp=sharing . To be more precise, it was some char in author's names which caused issues. Thank you for your help !

@cannin
Copy link
Collaborator

cannin commented Apr 14, 2020

Thanks. What do you see if you run this:

Sys.getlocale('LC_CTYPE')

for me, the return value is "en_US.UTF-8"

Does this message about parsing XML in R with non-Latin characters help? https://stackoverflow.com/questions/38612603/encoding-issue-when-parsing-xml-in-r for the original getPc command?

Here is more information on locales: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/locales

@emiliesecherre
Copy link
Author

emiliesecherre commented Apr 15, 2020

I got this :
`

Sys.getlocale('LC_CTYPE')
[1] "French_France.1252"`

The function "stringi::stri_conv()" didn't help, and when i try to use Sys.setlocale('LC_CTYPE', 'en_US.UTF-8) there's a warning...

@emiliesecherre
Copy link
Author

emiliesecherre commented Apr 15, 2020

If it helps, i used this : iconv(text, from = 'UTF-8', to = 'ASCII//TRANSLIT') and it seems to fix most characters issues ! I use to get that mistake thought (independant i think) :

2020-04-15 19:12:44,932 343127 [main] INFO org.biopax.paxtools.PaxtoolsMain - toSif: not blacklisting ubiquitous molecules (no blacklist.txt found)

Is that a problem ? I get sif files even with this issue.
However, some Biopax files doesn't produce sif files, i don't really understand why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants