diff --git a/README.md b/README.md index 67e5805..6f64aef 100644 --- a/README.md +++ b/README.md @@ -19,8 +19,8 @@ Date: May 13, 2017 * [Installation](#installation) * [Usage](#usage) * [Vignette in TitanCNA R package](#vignette-in-titancna-r-package) -* [License](#license) * [Acknowledgements](#acknowledgements) +* [License](#software-license) ## Links TitanCNA GitHub: https://github.com/gavinha/TitanCNA @@ -72,101 +72,6 @@ The easiest way to generate these files is by using the downloadable pipeline fr R scripts are provided to run the R component of the TITAN analysis using the TitanCNA R/Bioconductor package. Please go to the [scripts](scripts/) directory and look at the README there for more details. -**Input files** - This script assumes that the necessary input files have been generated. These are generated by the KRONOS workflow. - 1. GC-corrected, normalized read coverage using the HMMcopy suite - 2. Tumour allelic read counts at heterozygous SNPs (identifed from the normal sample). - -**Running the R script** -1. Look at the usage of the R script - ``` - # from the command line - > Rscript scripts/titanCNA.R --help - Usage: Rscript scripts/titanCNA.R [options] - - - Options: - --id=ID - Sample ID - - --hetFile=HETFILE - File containing allelic read counts at HET sites. (Required) - - --cnFile=CNFILE - File containing normalized coverage as log2 ratios. (Required) - - --outDir=OUTDIR - Output directory to output the results. (Required) - - --numClusters=NUMCLUSTERS - Number of clonal clusters. (Default: 1) - - --numCores=NUMCORES - Number of cores to use. (Default: 1) - - --ploidy_0=PLOIDY_0 - Initial ploidy value; float (Default: 2) - - --estimatePloidy=ESTIMATEPLOIDY - Estimate ploidy; TRUE or FALSE (Default: TRUE) - - --normal_0=NORMAL_0 - Initial normal contamination (1-purity); float (Default: 0.5) - - --estimateNormal=ESTIMATENORMAL - Estimate normal contamination method; string {'map', 'fixed'} (Default: map) - - --maxCN=MAXCN - Maximum number of copies to model; integer (Default: 8) - ... - ``` - - Additional arguments to consider are the following: - These arguments can be used to tune the model based on variance in the read coverage data and data-type (whole-exome sequencing or whole-genome sequencing). - ``` - --alphaK=ALPHAK - Hyperparameter on Gaussian variance; for WES, use 2500; for WGS, use 10000; - float (Default: 10000) - - --alphaKHigh=ALPHAKHIGH - Hyperparameter on Gaussian variance for extreme copy number states; - for WES, use 2500; for WGS, use 10000; float (Default: 10000) - ``` - -2. Example usage of R script - ``` - # normalized coverage file: test.cn.txt - # allelic read count file: test.het.txt - Rscript scripts/titanCNA.R --id test --hetFile test.het.txt --cnFile test.cn.txt \ - --numClusters 1 --numCores 1 --normal_0 0.5 --ploidy_0 2 --alphaK 10000 \ - --chrs "c(1:22, \"X\")" --estimatePloidy TRUE --outDir ./ - ``` - -3. Running TitanCNA for multiple restarts and model selection - ``` - numClusters=3 - numCores=4 - ## run TITAN for each ploidy (2 to 4) and clusters (1 to numClusters) - echo "Maximum number of clusters: $numClusters"; - for ploidy in $(seq 2 4) - do - echo "Running TITAN for $i clusters."; - outDir=run_ploidy$ploidy - mkdir $outDir - for numClust in $(seq 1 $numClusters) - do - echo "Running for ploidy=$ploidy"; - Rscript scripts/titanCNA.R --id test --hetFile test.het.txt --cnFile test.cn.txt \ - --numClusters $numClust --numCores $numCores --normal_0 0.5 --ploidy_0 $ploidy \ - --chrs "c(1:22, \"X\")" --estimatePloidy TRUE --outDir $outDir - done - echo "Completed job for $numClust clusters." - done - - ## select optimal solution - Rscript selectSolution.R run_ploidy2 run_ploidy3 run_ploidy4 0.05 ./ - ``` - ## Vignette in TitanCNA R package The PDF of the vignette can be accessed from R ``` @@ -180,14 +85,14 @@ pathToPdf <- paste0(pathToInstall, "/int/doc/TitanCNA.pdf) ``` The example provided will reproduce Figure 1 in the manuscript. However, it will be slightly different because the example is only based on the analysis of chr2, not genome-wide. -## License -TitanCNA R code is open source and is R/Bioconductor package is under GPLv3. This applies to the v1.9.0 and all subsequent versions within and obtained from Bioconductor. -Users who are using TitanCNA earlier than v1.9.0 not for the purpose of academic research should contact gavinha@broadinstitute.org, sshah@bccrc.ca, and prebstein@bccancer.bc.ca to inquire about previous licensing. - -# Acknowledgements +## Acknowledgements TitanCNA was developed by Gavin Ha while in the laboratories of Sohrab Shah (sshah@bccrc.ca) and Sam Aparicio (saparicio@bccrc.ca) at the Dept of Molecular Oncology, BC Cancer Agency, Vancouver, Canada. Yikan Wang and Daniel Lai have contributed code and discussions to this project. The KRONOS TITAN workflow was developed by Diljot Grewal () and Jafar Taghiyar (). HMMcopy was co-developed by Daniel Lai and Gavin Ha. TitanCNA was inspired by existing methods including [OncoSNP](https://sites.google.com/site/oncosnp/) and [PyClone](https://bitbucket.org/aroth85/pyclone/wiki/Home) + +## Software License +TitanCNA R code is open source and is R/Bioconductor package is under GPLv3. This applies to the v1.9.0 and all subsequent versions within and obtained from Bioconductor. +Users who are using TitanCNA earlier than v1.9.0 not for the purpose of academic research should contact gavinha@broadinstitute.org, sshah@bccrc.ca, and prebstein@bccancer.bc.ca to inquire about previous licensing.