-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Downloading RNAseq Data with gdcRNADownload() #20
Comments
I got the same issue, looks like the link to obtain the manifest from the gdc api has changed and now we get an empty table. They have to change the url query. |
There is some issue with the HTSeq-Counts data on the GDC portal, I guess it is not available with the new update. So we need to change the workflow.type to "STAR - COUNTS". |
Hi all! I've been able to finally got access to the code of some of the used functions. So as @pranavkataria978 mentioned the issue was when trying the download the "RNA-seq" data type. The function gdcGetURL(), looks for the workflow type "HTSeq - Counts" that no longer exists. The workflow that might look close to this one now (after also checking the database) is "STAR - Counts" as he said. So if you create your own function gdcGetURL() with this small change, it made sense to me that it should work. But the only problem is that then, inside this function there is a bunch of other functions being called that for some reason now they are outside the original package function (no idea why this happens...) and they aren't found anymore. So in the end, I had to rename and save in the current environment a few more functions to make it all work again. After this step, now all these functions can be found and called. So here it goes as I have it right now to make RNAseq download work: gdcGetURL_2 ############# And for the other funtions just renaming and saving them in my local environment for the problems I mentioned before: downloadClientFun_2 <- function (os) { ############# file.move_2 <- function (files, directory) ############# manifestDownloadFun_2 <- function (manifest = manifest, directory) ############# gdcRNADownload_2 <- function (manifest = NULL, project.id, data.type, directory = "Data", ############# I believe I haven't missed any of them. Now it should all work nicely. For example: project <- 'TCGA-PRAD' Let me know if I may have missed sth! |
Hello @Josuerinho I want to test your code, but i have this error : Error in paste(filters, "pretty=true", "format=JSON", "size=10000", expand, : object 'filters' not found > url <- paste(urlAPI, payload, sep = "") Error in paste(urlAPI, payload, sep = "") : object 'urlAPI' not found > return(url) Error: no function to return from, jumping to top level > } Error: unexpected '}' in "}"
|
Hi @benchsar! Sorry for the late reply. That code I posted was just a little workaround to original functions to get them to work but the problem has been solved and the original functions work as expected again. Try it and let me know if that it's not the case. |
Hi all!!
I've been trying to use the function gdcRNADownload() to download RNAseq data from TCGA but no matter what RNAseq type I try, I always get the same error:
Successfully downloaded: 0
Warning message:
In read.table(paste(url, "&return_type=manifest", sep = ""), header = TRUE, :
incomplete final line found by readTableHeader on 'https://api.gdc.cancer.gov/files?filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-CHOL%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:%22Transcriptome%20Profiling%22%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:%22Gene%20Expression%20Quantification%22%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:%22HTSeq%20-%20Counts%22%7D%7D]%7D&pretty=true&format=JSON&size=10000&expand=analysis,analysis.input_files,associated_entities,cases,cases.diagnoses,cases.diagnoses.treatments,cases.demographic,cases.project,cases.samples,cases.samples.portions,cases.samples.portions.analytes,cases.samples.portions.analytes.aliquots,cases.samples.portions.slides&return_type=manifest'
It only happens with RNAseq type of data. I can download miRNAs data without problems. Initially I was working on a Macbook air with M1 chip:
sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringr_1.4.0 readxl_1.4.0 tibble_3.1.6 oligo_1.56.0
[5] Biostrings_2.60.2 GenomeInfoDb_1.28.4 XVector_0.32.0 IRanges_2.26.0
[9] S4Vectors_0.30.2 oligoClasses_1.54.0 GEOquery_2.60.0 Biobase_2.52.0
[13] BiocGenerics_0.38.0 edgeR_3.34.1 limma_3.48.3 GDCRNATools_1.13.1
loaded via a namespace (and not attached):
[1] utf8_1.2.2 tidyselect_1.1.2 RSQLite_2.2.12
[4] AnnotationDbi_1.54.1 htmlwidgets_1.5.4 grid_4.1.1
[7] BiocParallel_1.26.2 scatterpie_0.1.7 munsell_0.5.0
[10] codetools_0.2-18 preprocessCore_1.54.0 DT_0.22
[13] colorspace_2.0-3 GOSemSim_2.18.1 filelock_1.0.2
[16] knitr_1.38 rstudioapi_0.13 ggsignif_0.6.3
[19] DOSE_3.18.3 pathview_1.32.0 MatrixGenerics_1.4.3
[22] KEGGgraph_1.52.0 GenomeInfoDbData_1.2.6 KMsurv_0.1-5
[25] polyclip_1.10-0 bit64_4.0.5 farver_2.1.0
[28] downloader_0.4 vctrs_0.4.0 treeio_1.16.2
[31] generics_0.1.2 xfun_0.30 BiocFileCache_2.0.0
[34] affxparser_1.64.1 R6_2.5.1 graphlayouts_0.8.0
[37] locfit_1.5-9.5 bitops_1.0-7 cachem_1.0.6
[40] fgsea_1.18.0 gridGraphics_0.5-1 DelayedArray_0.18.0
[43] assertthat_0.2.1 promises_1.2.0.1 scales_1.1.1
[46] ggraph_2.0.5 enrichplot_1.12.3 gtable_0.3.0
[49] tidygraph_1.2.1 rlang_1.0.2 genefilter_1.74.1
[52] splines_4.1.1 rstatix_0.7.0 lazyeval_0.2.2
[55] broom_0.7.12 BiocManager_1.30.16 reshape2_1.4.4
[58] abind_1.4-5 backports_1.4.1 httpuv_1.6.5
[61] qvalue_2.24.0 clusterProfiler_4.0.5 tools_4.1.1
[64] ggplotify_0.1.0 ggplot2_3.3.5 affyio_1.62.0
[67] ellipsis_0.3.2 gplots_3.1.1 ff_4.0.5
[70] RColorBrewer_1.1-3 Rcpp_1.0.8.3 plyr_1.8.7
[73] progress_1.2.2 zlibbioc_1.38.0 purrr_0.3.4
[76] RCurl_1.98-1.6 prettyunits_1.1.1 ggpubr_0.4.0
[79] viridis_0.6.2 cowplot_1.1.1 zoo_1.8-9
[82] SummarizedExperiment_1.22.0 ggrepel_0.9.1 magrittr_2.0.3
[85] data.table_1.14.2 DO.db_2.9 survminer_0.4.9
[88] matrixStats_0.61.0 hms_1.1.1 patchwork_1.1.1
[91] mime_0.12 xtable_1.8-4 XML_3.99-0.9
[94] gridExtra_2.3 compiler_4.1.1 biomaRt_2.48.3
[97] KernSmooth_2.23-20 crayon_1.5.1 shadowtext_0.1.1
[100] htmltools_0.5.2 ggfun_0.0.6 later_1.3.0
[103] tzdb_0.3.0 tidyr_1.2.0 geneplotter_1.70.0
[106] aplot_0.1.3 DBI_1.1.2 tweenr_1.0.2
[109] dbplyr_2.1.1 MASS_7.3-56 rappdirs_0.3.3
[112] Matrix_1.4-1 car_3.0-12 readr_2.1.2
[115] cli_3.2.0 igraph_1.3.0 km.ci_0.5-2
[118] GenomicRanges_1.44.0 pkgconfig_2.0.3 xml2_1.3.3
[121] foreach_1.5.2 ggtree_3.0.4 annotate_1.70.0
[124] yulab.utils_0.0.4 digest_0.6.29 graph_1.70.0
[127] cellranger_1.1.0 fastmatch_1.1-3 survMisc_0.5.5
[130] tidytree_0.3.9 curl_4.3.2 shiny_1.7.1
[133] gtools_3.9.2 rjson_0.2.21 lifecycle_1.0.1
[136] nlme_3.1-157 GenomicDataCommons_1.16.0 jsonlite_1.8.0
[139] carData_3.0-5 viridisLite_0.4.0 fansi_1.0.3
[142] pillar_1.7.0 lattice_0.20-45 KEGGREST_1.32.0
[145] fastmap_1.1.0 httr_1.4.2 survival_3.3-1
[148] GO.db_3.13.0 glue_1.6.2 png_0.1-7
[151] iterators_1.0.14 bit_4.0.4 Rgraphviz_2.36.0
[154] ggforce_0.3.3 stringi_1.7.6 blob_1.2.2
[157] DESeq2_1.32.0 org.Hs.eg.db_3.13.0 caTools_1.18.2
[160] memoise_2.0.1 dplyr_1.0.8 ape_5.6-2
But I also have the same issue when I try to execute the same function in the cluster:
sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Springdale Linux 7.9 (Verona)
Matrix products: default
BLAS/LAPACK: /ifs/data/fg2532_lab/jc5737/Conda_env/lib/libopenblasp-r0.3.18.so
locale:
[1] C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] stringr_1.4.0 readxl_1.4.0 tibble_3.1.6
[4] oligo_1.58.0 Biostrings_2.62.0 GenomeInfoDb_1.30.1
[7] XVector_0.34.0 IRanges_2.28.0 S4Vectors_0.32.4
[10] oligoClasses_1.56.0 GEOquery_2.62.2 Biobase_2.54.0
[13] BiocGenerics_0.40.0 edgeR_3.36.0 limma_3.50.1
[16] GDCRNATools_1.14.0
loaded via a namespace (and not attached):
[1] utf8_1.2.2 tidyselect_1.1.2
[3] RSQLite_2.2.12 AnnotationDbi_1.56.2
[5] htmlwidgets_1.5.4 grid_4.1.3
[7] BiocParallel_1.28.3 scatterpie_0.1.7
[9] munsell_0.5.0 preprocessCore_1.56.0
[11] codetools_0.2-18 DT_0.22
[13] colorspace_2.0-3 GOSemSim_2.20.0
[15] filelock_1.0.2 knitr_1.38
[17] ggsignif_0.6.3 DOSE_3.20.1
[19] pathview_1.34.0 MatrixGenerics_1.6.0
[21] KEGGgraph_1.54.0 GenomeInfoDbData_1.2.7
[23] KMsurv_0.1-5 polyclip_1.10-0
[25] bit64_4.0.5 farver_2.1.0
[27] downloader_0.4 vctrs_0.4.0
[29] treeio_1.18.1 generics_0.1.2
[31] xfun_0.30 BiocFileCache_2.2.1
[33] affxparser_1.66.0 R6_2.5.1
[35] graphlayouts_0.8.0 locfit_1.5-9.5
[37] bitops_1.0-7 cachem_1.0.6
[39] fgsea_1.20.0 gridGraphics_0.5-1
[41] DelayedArray_0.20.0 assertthat_0.2.1
[43] promises_1.2.0.1 scales_1.1.1
[45] ggraph_2.0.5 enrichplot_1.14.2
[47] gtable_0.3.0 tidygraph_1.2.1
[49] rlang_1.0.2 genefilter_1.76.0
[51] splines_4.1.3 rstatix_0.7.0
[53] lazyeval_0.2.2 broom_0.7.12
[55] BiocManager_1.30.16 reshape2_1.4.4
[57] abind_1.4-5 backports_1.4.1
[59] httpuv_1.6.5 qvalue_2.26.0
[61] clusterProfiler_4.2.2 tools_4.1.3
[63] ggplotify_0.1.0 ggplot2_3.3.5
[65] affyio_1.64.0 ellipsis_0.3.2
[67] gplots_3.1.1 ff_4.0.5
[69] RColorBrewer_1.1-3 Rcpp_1.0.8.3
[71] plyr_1.8.7 progress_1.2.2
[73] zlibbioc_1.40.0 purrr_0.3.4
[75] RCurl_1.98-1.6 prettyunits_1.1.1
[77] ggpubr_0.4.0 viridis_0.6.2
[79] zoo_1.8-9 SummarizedExperiment_1.24.0
[81] ggrepel_0.9.1 magrittr_2.0.3
[83] data.table_1.14.2 DO.db_2.9
[85] survminer_0.4.9 matrixStats_0.61.0
[87] hms_1.1.1 patchwork_1.1.1
[89] mime_0.12 xtable_1.8-4
[91] XML_3.99-0.9 gridExtra_2.3
[93] compiler_4.1.3 biomaRt_2.50.3
[95] KernSmooth_2.23-20 crayon_1.5.1
[97] shadowtext_0.1.1 htmltools_0.5.2
[99] ggfun_0.0.6 later_1.3.0
[101] tzdb_0.3.0 tidyr_1.2.0
[103] geneplotter_1.72.0 aplot_0.1.3
[105] DBI_1.1.2 tweenr_1.0.2
[107] dbplyr_2.1.1 MASS_7.3-56
[109] rappdirs_0.3.3 Matrix_1.4-1
[111] car_3.0-12 readr_2.1.2
[113] cli_3.2.0 parallel_4.1.3
[115] igraph_1.3.0 GenomicRanges_1.46.1
[117] pkgconfig_2.0.3 km.ci_0.5-6
[119] xml2_1.3.3 foreach_1.5.2
[121] ggtree_3.2.1 annotate_1.72.0
[123] yulab.utils_0.0.4 digest_0.6.29
[125] graph_1.72.0 cellranger_1.1.0
[127] fastmatch_1.1-3 survMisc_0.5.6
[129] tidytree_0.3.9 curl_4.3.2
[131] shiny_1.7.1 gtools_3.9.2
[133] rjson_0.2.21 lifecycle_1.0.1
[135] nlme_3.1-157 GenomicDataCommons_1.18.0
[137] jsonlite_1.8.0 carData_3.0-5
[139] viridisLite_0.4.0 fansi_1.0.3
[141] pillar_1.7.0 lattice_0.20-45
[143] KEGGREST_1.34.0 fastmap_1.1.0
[145] httr_1.4.2 survival_3.3-1
[147] GO.db_3.14.0 glue_1.6.2
[149] png_0.1-7 iterators_1.0.14
[151] bit_4.0.4 Rgraphviz_2.38.0
[153] ggforce_0.3.3 stringi_1.7.6
[155] blob_1.2.2 DESeq2_1.34.0
[157] org.Hs.eg.db_3.14.0 caTools_1.18.2
[159] memoise_2.0.1 dplyr_1.0.8
[161] ape_5.6-2
So I don't know how to solve the problem because when I try to troubleshoot the gdcRNADownload() function and follow line by line the code, it says that one of the inner functions (gdcGetURL()) it's not found. So I don't know where the error comes from because I can't access the URL containing the RNAseq data. It might even be a format problem with the downloaded data. I know this issue was reported before but given there was no follow-through, I thought a new threat might bring a bit more attention. Sorry guys and thanks a lot for your help!
Josu
The text was updated successfully, but these errors were encountered: