GDCRNATools - an R/Bioconductor package for downloading, organizing, and integrative analyzing lncRNA, mRNA, and miRNA data in GDC
-
The GDCRNATools Manual and R code of the GDCRNATools Workflow has been updated in 10-30-2018.
-
If you use
GDCRNATools
in your published research, please cite:
Li, R., Qu, H., Wang, S., Wei, J., Zhang, L., Ma, R., Lu, J., Zhu, J., Zhong, W., and Jia, Z. (2018). GDCRNATools: an R/Bioconductor package for integrative analysis of lncRNA, miRNA and mRNA data in GDC. Bioinformatics 34, 2515-2517. https://doi.org/10.1093/bioinformatics/bty124. -
Please add my WeChat: rli012 or email to [email protected] if you have further questions.
The Genomic Data Commons (GDC) maintains standardized genomic, clinical, and biospecimen data from National Cancer Institute (NCI) programs including The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research To Generate Effective Treatments (TARGET), It also accepts high quality datasets from non-NCI supported cancer research programs, such as genomic data from the Foundation Medicine.
GDCRNATools
is an R/Bioconductor package which provides a standard, easy-to-use and comprehensive pipeline for downloading, organizing, and integrative analyzing RNA expression data in the GDC portal with an emphasis on deciphering the lncRNA-mRNA related ceRNA regulatory network in cancer.
The comprehensive manual of GDCRNATools
is available here: GDCRNATools Manual
R code of the workflow is available here: GDCRNATools Workflow
- The stable release version of
GDCRNATools
requires R(>=3.5.0) and Bioconductor(>=3.8). Please start R and enter:
## try http:// if https:// URLs are not supported
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("GDCRNATools")
- To install the development version of
GDCRNATools
, please update your R and Biocondutor to the latest version and run:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("GDCRNATools", version = "devel")
Please download the compressed package here: GDCRNATools_1.1.5.tar.gz
-
Make sure that your R is installed in 'c:\program files'
-
Install Rtools in 'c:\program files'
-
Add R and Rtools to the Path Variable on the Environment Variables panel, including
c:\program files\Rtools\bin
c:\program files\Rtools\gcc-4.6.3\bin
c:\program files\R\R.3.x.x\bin\i386
c:\program files\R\R.3.x.x\bin\x64
-
Run the following code in R
install.packages('GDCRNATools_1.1.5.tar.gz', repos = NULL, type='source')
Just run the following code in R
install.packages('GDCRNATools_1.1.5.tar.gz', repos = NULL, type='source')
If GDCRNATools
cannot be installed due to the lack of dependencies, please run the following code ahead to install those pacakges either simutaneously or separately:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
### install packages simutaneously ###
BiocManager::install(c('limma', 'edgeR', 'DESeq2', 'clusterProfiler', 'DOSE', 'org.Hs.eg.db', 'biomaRt', 'BiocParallel', 'GenomicDataCommons'))
install.packages(c('shiny', 'jsonlite', 'rjson', 'survival', 'survminer', 'ggplot2', 'gplots', 'Hmisc', 'DT', 'matrixStats', 'xml2'))
### install packages seperately ###
BiocManager::install('limma')
BiocManager::install('edgeR')
BiocManager::install('DESeq2')
BiocManager::install('clusterProfiler')
BiocManager::install('DOSE')
BiocManager::install('org.Hs.eg.db')
BiocManager::install('biomaRt')
BiocManager::install('BiocParallel')
BiocManager::install('GenomicDataCommons')
install.packages('shiny')
install.packages('jsonlite')
install.packages('rjson')
install.packages('survival')
install.packages('survminer')
install.packages('ggplot2')
install.packages('gplots')
install.packages('Hmisc')
install.packages('DT')
install.packages('matrixStats')
install.packages('xml2')
Q1: gdcRNADownload() function doesn't work with the following error:
Error in FUN(X[[i]], ...):
unused arguments(desination_dir=directory, overwrite=TRUE)
A1: This error occurs when the default API method for downloading fails. Please add method='gdc-client'
to the gdcRNADownload() function.
####### Download RNAseq data #######
project <- 'TCGA-CHOL'
rnadir <- paste(project, 'RNAseq', sep='/')
gdcRNADownload(project.id = 'TCGA-CHOL',
data.type = 'RNAseq',
write.manifest = FALSE,
method = 'gdc-client', ### use 'gdc-client' to download data
directory = rnadir)
Q2: gdcRNAMerge() doesn't work with the following error:
Error in open.connection(file, 'rt'): cannot open the connection
In addition: Warning message:
In open.connection(file, 'rt'):
cannot open compressed file 'TCGA-XXXX/RNAseq/xxx-xxx-xxx-xxx.htseq.counts.gz', probable reason 'No such file or directory'.
A2: This is usually because the data for different samples are downloaded in separate folders. Please add organized=FALSE
to the gdcRNAMerge() function.
####### Merge RNAseq data #######
rnaCounts <- gdcRNAMerge(metadata = metaMatrix.RNA,
path = rnadir,
organized = FALSE, # if the data are in separate folders
data.type = 'RNAseq')