Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDCprepare Error adding clinical information to samples #629

Open
IvanEllson opened this issue May 16, 2024 · 6 comments
Open

GDCprepare Error adding clinical information to samples #629

IvanEllson opened this issue May 16, 2024 · 6 comments

Comments

@IvanEllson
Copy link

IvanEllson commented May 16, 2024

Hello,

I have encontered an error using GDCprepare with gene expression data from BEATAML1.0-COHORT, while trying to add clinical information to samples:

Error in dplyr::bind_cols():
! Can't recycle ..1 (size 0) to match ..2 (size 2).

The error appears with the last TCGAbiolinks versions (2.32.0 and 2.31.4), but not with older versions like 2.28.4.
The error can be found running the following code (including rlang::last_trace() output):

> library(TCGAbiolinks)
> query_BEATAML1.0COHORT <- GDCquery(project = "BEATAML1.0-COHORT",
+                                    data.category = "Transcriptome Profiling",
+                                    data.type = "Gene Expression Quantification",
+                                    data.format = "tsv")
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: BEATAML1.0-COHORT
--------------------
oo Filtering results
--------------------
ooo By data.format
ooo By data.type
----------------
oo Checking data
----------------
ooo Checking if there are duplicated cases
ooo Checking if there are results for the query
-------------------
o Preparing output
-------------------
> 
> GDCdownload(query_BEATAML1.0COHORT)
Downloading data for project BEATAML1.0-COHORT
Of the 735 files for download 735 already exist.
All samples have been already downloaded
> 
> data_BEATAML1.0COHORT <- GDCprepare(query_BEATAML1.0COHORT)
|==================================================================================================================================================|100%                      Completed after 6 s 
Starting to add information to samples
 => Add clinical information to samples
Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 0) to match `..2` (size 2).
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/vctrs_error_incompatible_size>
Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 0) to match `..2` (size 2).
---
Backtrace:
     ▆
  1. └─TCGAbiolinks::GDCprepare(query_BEATAML1.0COHORT)
  2.   └─TCGAbiolinks:::readTranscriptomeProfiling(...)
  3.     └─TCGAbiolinks:::makeSEfromTranscriptomeProfilingSTAR(...)
  4.       └─TCGAbiolinks::colDataPrepare(cases)
  5.         └─TCGAbiolinks:::splitAPICall(...)
  6.           └─base::tryCatch(...)
  7.             └─base (local) tryCatchList(expr, classes, parentenv, handlers)
  8.               └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
  9.                 └─value[[3L]](cond)
 10.                   └─TCGAbiolinks (local) FUN(items[start:end])
 11.                     └─dplyr::bind_cols(df %>% as.data.frame, diagnoses %>% as.data.frame)
Run rlang::last_trace(drop = FALSE) to see 5 hidden frames.

And my R session info:

> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Madrid
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TCGAbiolinks_2.32.0

loaded via a namespace (and not attached):
 [1] writexl_1.5.0               tidyselect_1.2.1            dplyr_1.1.4                 blob_1.2.4                  filelock_1.0.3              Biostrings_2.72.0          
 [7] fastmap_1.2.0               BiocFileCache_2.12.0        XML_3.99-0.16.1             digest_0.6.35               lifecycle_1.0.4             KEGGREST_1.44.0            
[13] RSQLite_2.3.6               magrittr_2.0.3              compiler_4.4.0              rlang_1.1.3                 progress_1.2.3              tools_4.4.0                
[19] utf8_1.2.4                  data.table_1.15.4           knitr_1.46                  prettyunits_1.2.0           S4Arrays_1.4.0              bit_4.0.5                  
[25] curl_5.2.1                  DelayedArray_0.30.1         plyr_1.8.9                  xml2_1.3.6                  librarian_1.8.1             abind_1.4-5                
[31] withr_3.0.0                 purrr_1.0.2                 BiocGenerics_0.50.0         grid_4.4.0                  stats4_4.4.0                fansi_1.0.6                
[37] colorspace_2.1-0            ggplot2_3.5.1               scales_1.3.0                biomaRt_2.60.0              SummarizedExperiment_1.34.0 cli_3.6.2                  
[43] crayon_1.5.2                generics_0.1.3              rstudioapi_0.16.0           bcellViper_1.40.0           httr_1.4.7                  tzdb_0.4.0                 
[49] DBI_1.2.2                   cachem_1.0.8                stringr_1.5.1               zlibbioc_1.50.0             rvest_1.0.4                 AnnotationDbi_1.66.0       
[55] TCGAbiolinksGUI.data_1.24.0 BiocManager_1.30.23         XVector_0.44.0              matrixStats_1.3.0           vctrs_0.6.5                 Matrix_1.7-0               
[61] jsonlite_1.8.8              IRanges_2.38.0              hms_1.1.3                   S4Vectors_0.42.0            bit64_4.0.5                 ggrepel_0.9.5              
[67] tidyr_1.3.1                 glue_1.7.0                  stringi_1.8.4               gtable_0.3.5                GenomeInfoDb_1.40.0         GenomicRanges_1.56.0       
[73] UCSC.utils_1.0.0            munsell_0.5.1               tibble_3.2.1                pillar_1.9.0                rappdirs_0.3.3              GenomeInfoDbData_1.2.12    
[79] R6_2.5.1                    dbplyr_2.5.0                httr2_1.0.1                 lattice_0.22-6              Biobase_2.64.0              readr_2.1.5                
[85] png_0.1-8                   memoise_2.0.1               dorothea_1.16.0             Rcpp_1.0.12                 SparseArray_1.4.3           xfun_0.44                  
[91] downloader_0.4              MatrixGenerics_1.16.0       pkgconfig_2.0.3            

Thank you

@kellentjioe
Copy link

I have the same issue! Did you find out a way to solve it? Thanks!

@warbol
Copy link

warbol commented Aug 8, 2024

Issue is that BEATAML barcodes start with "aq-" which the code is not prepared to handle.

Added fix #634

@ChristianRohde
Copy link

Hi @warbol I have exactly that issue and want to use TCGAbiolinks including your updated function. Do you have an advice how I can to this? Can I install a package version from a development branch? Thank you, Christian

@warbol
Copy link

warbol commented Sep 30, 2024

@ChristianRohde

library(remotes)
remotes::install_github(repo="BioinformaticsFMRP/TCGAbiolinks", ref = remotes::github_pull(634))

will install the updated package from the GitHub pull request #634

@kellentjioe
Copy link

Hi @warbol
Thank you! I tried to download the updated package but I got this error below. Could you assist me with that?

install_github('BioinformaticsFMRP/TCGAbiolinks', 'remotes::github_pull(634)')
Downloading GitHub repo BioinformaticsFMRP/TCGAbiolinks@remotes::github_pull(634)
Error in utils::download.file(url, path, method = method, quiet = quiet, :
cannot open URL 'https://api.github.com/repos/BioinformaticsFMRP/TCGAbiolinks/tarball/remotes%3A%3Agithub_pull%28634%29'

Thank you!

@warbol
Copy link

warbol commented Oct 23, 2024

You can try using devtools version of install_github instead of remotes?
library(devtools)
devtools::install_github(repo="BioinformaticsFMRP/TCGAbiolinks", ref = devtools::github_pull(634))

I just verified it works so it may be a network issue on your end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants