Skip to content

Commit

Permalink
mod penalty options, mod README
Browse files Browse the repository at this point in the history
  • Loading branch information
andrej-fischer committed Apr 2, 2014
1 parent e55d94d commit 24eb016
Show file tree
Hide file tree
Showing 8 changed files with 116 additions and 109 deletions.
43 changes: 21 additions & 22 deletions docs/README-cloneHD.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,39 +179,38 @@ Both can be learned with filterHD for data with persistence.
4 2
etc.

The first column is the chromosome, the next columns are the limits to be used for subclone 1, 2 etc.
For subclones not specified, the limit in the last column is used. In the example above, subclone 1 has an upper limit of 8 total copies in chr3, for all other subclones and in all other chromosomes, the upper limit is 2. If only SNV data is provided (and `--avail-cn [file]` is not given), this is used to fix the total number of copies. If `--max-tcn` is not given, cloneHD uses the normal copy number for each chr.
The first column is the chromosome, the next columns are the limits to be used for subclone 1, 2 etc. For subclones not specified, the limit in the last column is used. In the example above, subclone 1 has an upper limit of 8 total copies in chr3, for all other subclones and in all other chromosomes, the upper limit is 2. If only SNV data is provided (and `--avail-cn [file]` is not given), this is used to fix the total number of copies. If `--max-tcn` is not given, cloneHD uses the normal copy number for each chr.

* `--learn-priors [0/1:0]` For snv-mode only: if 1, then the parameters
for the multiplicative genotype priors are learned.
for the multiplicative SNV genotype priors are learned.

* `--chr [file]` Set normal copy numbers.

The normal copy number for every single
chromosome can be specified. This is needed only for non-human DNA. If not
given, human DNA is assumed and the sex is inferred from the
presence or absence of chr 24 (= chr Y) in the input data.
The normal copy number for every single chromosome can be specified. This is needed only for non-human DNA. If not given, human DNA is assumed and the sex is inferred from the presence or absence of chr 24 (= chr Y) in the input data.

* `--snv-fprate [double:1.0e-4]` The false positive rate for SNVs,
i.e. rate of SNV data points of genotype all-0.
* `--snv-fprate [double:1.0e-4]` Set the false positive rate for SNVs.

* `--snv-fpfreq [double:0.01]` The typical frequency of false positive SNVs.

* `--snv-pen [double:0.01]` The penalty for higher than expected
genotypes.
* `--snv-pen-mult [double:0.01]` Set the penalty against multiple SNVs.

* `--baf-pen [double:1.0]` The penalty for complex minor allele
status.
* `--snv-pen-high [doube:0.5]` Set the penalty against higher genotypes.

* `--cna-pen-zero [double:0.9]` Set the penalty against zero total copies.

* `--cna-pen-norm [double:1.0]` Set the penalty against non-normal total c.n.

* `--cna-pen-diff [double:1.0]` Set the penalty against different total c.n.

* `--baf-pen-comp [double:1.0]` The penalty against complex minor allele status.

* `--cna-jump [double:-1.0]`

* `--baf-jump [double:-1.0]`

* `--snv-jump [double:-1.0]`

A constant jump probability per base pair. If -1, then observations are
uncorrellated along the genome. Can be learned with filterHD. No fuzzy
data segmentation is performed. Useful in combination with
`--clones`, where very high-definition information
available. Using this option will change the posterior output file format.
A constant jump probability per base pair can be set. If set to `-1.0`, then observations are uncorrellated along the genome (not the same as `1.0`). Can be learned with filterHD. No fuzzy data segmentation is performed. Useful in combination with `--clones`, where very high-definition information available. Using this option will change the posterior output file format.

## Bulk options

Expand All @@ -221,16 +220,16 @@ subclones with unknown genotypes and frequencies. Allele frequency
data is input with `--snv`. Data segmentation can be used with
`--snv-jumps`. Read depth data can also be specified with `--cna`.

* `--bulk-mean [double]` The bulk allele frequency profile.
* `--bulk-mean [file]` The bulk allele frequency profile.

Must be a filterHD `*posterior.*.txt` file. Only the posterior mean is used.
Must be a filterHD `*posterior-[int].txt` file. Only the posterior mean is used.

* `--bulk-prior [file]` The bulk allele frequency profile.

Must be a filterHD `*posterior.*.txt` file. The whole posterior
Must be a filterHD `*posterior-[int].txt` file. The whole posterior
distribution is used (run filterHD with `--dist 1` to obtain it).

* `--bulk-updates [int:0]` The number of Bayesian updates of the
* `--bulk-updates [int:0]` The number of (Bayesian) updates of the
bulk allele frequency profile (if `--bulk-prior` was used).

* `--bulk-fix [double:0.0]` Use a flat and fixed bulk allele
Expand Down
5 changes: 3 additions & 2 deletions run-example.sh
Original file line number Diff line number Diff line change
Expand Up @@ -70,17 +70,18 @@ then
echo "*** cloneHD ***"
echo
# The CNA and BAF data is analysed for subclonality.
# Try varying the --min-jump, --force and --max-tcn value values and try --mass-gauging 0.
# Try varying the --min-jump, --force and --max-tcn values and try --mass-gauging 0.
# Try adding the SNV data to the mix.
cmd="$cloneHD --cna $tumorCNA --baf $tumorBAF --pre ${results}/tumor --bias $bias --seed 123 --trials 2\
--nmax 3 --force --max-tcn 4 --cna-jumps $tumorCNAjumps --baf-jumps $tumorBAFjumps --min-jump 0.01 --restarts 10 --mass-gauging 1"
echo $cmd
$cmd
echo
cat ${results}/tumor.summary.txt
echo

# Using the information from above, the SNV data is analysed. Try what happens removing the --avail-cn and --mean-tcn options.
cmd="$cloneHD --snv $tumorSNV --pre ${results}/tumor --seed 123 --trials 2\
cmd="$cloneHD --snv $tumorSNV --pre ${results}/tumorSNV --seed 123 --trials 2\
--nmax 3 --force --max-tcn 4 --restarts 10 --mean-tcn ${results}/tumor.mean-tcn.txt --avail-cn ${results}/tumor.avail-cn.txt"
echo $cmd
$cmd
Expand Down
56 changes: 24 additions & 32 deletions src/clone-prior.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,15 @@ void Clone::set_cna_prior( gsl_vector * prior, int sample){
cns.clear();
double p = 1.0;
for (int j=0; j<nClones; j++){
if (copynumber[i][j] == 0) p *= cna_pen_zero;//penalty for no copies
p *= pow( cna_pen_norm, abs(copynumber[i][j] - ncn));
if (copynumber[i][j] == 0) p *= cna_pen_zero;
if (copynumber[i][j] > maxtcn_per_clone[chr][j]) p = 0.0;
cns.insert( copynumber[i][j] );
}
p *= pow( cna_pen_diff, (int) cns.size());
for (int j=0; j<nClones; j++) p *= pow( cna_pen_norm, abs(copynumber[i][j] - ncn) );
gsl_vector_set( prior, i, p);
}
//normalize and logify...
double norm = gsl_blas_dasum(prior);
if (norm <= 0.0 || norm != norm) abort();
gsl_vector_scale( prior, 1.0 / norm);
Expand All @@ -49,7 +50,7 @@ void Clone::initialize_snv_prior_param(){// SNV prior, conditional on max-tcn
if (initial_snv_prior_param != NULL) gsl_matrix_free(initial_snv_prior_param);
initial_snv_prior_param = gsl_matrix_calloc( maxtcn+1, maxtcn+1);
gsl_matrix_set( initial_snv_prior_param, 0, 0, snv_fpr);
double p = pinit;// initial penalty for higher genotypes
double p = snv_pen_high;// initial penalty for higher genotypes
for (int cn=1; cn <= maxtcn; cn++){
if ( all_maxtcn.count(cn) == 0 ) continue;
gsl_vector_view subrow = gsl_matrix_subrow( initial_snv_prior_param, cn, 0, cn+1);
Expand All @@ -75,7 +76,7 @@ void Clone::set_snv_prior( gsl_matrix * snv_prior_param){
for (int j=0; j<nClones; j++){
int limit = maxtcn_per_clone[chr][j];
if (copynumber[i][j] <= limit){
p *= (limit==0) ? 1.0 : gsl_matrix_get( snv_prior_param, limit, copynumber[i][j]);
p *= limit==0 ? 1.0 : gsl_matrix_get( snv_prior_param, limit, copynumber[i][j]);
}
else{
p = 0.0;
Expand All @@ -84,11 +85,12 @@ void Clone::set_snv_prior( gsl_matrix * snv_prior_param){
}
gsl_vector_set( snv_prior[chr], i, p);
}
//normalize and logify...
double norm = gsl_blas_dasum(snv_prior[chr]);
if (norm<=0) abort();
gsl_vector_scale( snv_prior[chr], (1.0-fpr) / norm);
gsl_vector_set( snv_prior[chr], 0, fpr);
if (snvEmit->log_space){//log-transform
if (snvEmit->log_space){
for (int l=0; l<nLevels; l++){
double val = gsl_vector_get( snv_prior[chr], l);
gsl_vector_set( snv_prior[chr], l, val>0 ? log(val) : logzero);
Expand All @@ -100,14 +102,13 @@ void Clone::set_snv_prior( gsl_matrix * snv_prior_param){
//CNA + BAF (+SNV) mode...
void Clone::set_baf_prior_map(){
if ( baf_prior_map == NULL){
baf_prior_map = gsl_matrix_alloc(maxtcn+1,maxtcn+1);
baf_prior_map = gsl_matrix_alloc( maxtcn+1, maxtcn+1);
}
gsl_matrix_set_zero( baf_prior_map );
double p = baf_pen;//penalty for complex chromosome status (default:1.0)
gsl_matrix_set_zero(baf_prior_map);
double f = 0;
for (int cn=0; cn <= maxtcn; cn++){
for (int bcn=0; bcn <= cn; bcn++){
f = pow( p, int( fabs(double(bcn) - 0.5*double(cn)) ));
for (int bcn=0; bcn <= cn; bcn++){//penalty for complex chromosome status
f = pow( baf_pen_comp, int( fabs(double(bcn) - 0.5*double(cn))));
gsl_matrix_set( baf_prior_map, bcn, cn, f);
}
//normalize...
Expand All @@ -130,12 +131,11 @@ void Clone::set_snv_prior_map(){//either via BAF or else via CNA
snv_prior_from_cna_baf_map[cn] = gsl_matrix_alloc( maxtcn+1, maxtcn+1);
}
}
double p = snv_pen;//penalty for SNVs in cn higher than max BAF cn (multiple hits)
for (int cn=0; cn <= maxtcn; cn++){//cn=total cn
gsl_matrix_set_zero( snv_prior_from_cna_baf_map[cn]);
for (int j=0; j<=cn; j++){//j=minor cn
for (int i=0; i<= cn; i++){
double pen = pow( p, max( 0, i - max(j,cn-j)) );
for (int i=0; i<= cn; i++){//penalty for multiple hit SNVs
double pen = pow( snv_pen_mult, max( 0, i - max(j,cn-j)) );
gsl_matrix_set( snv_prior_from_cna_baf_map[cn], i, j, pen);
}
//normalize...
Expand All @@ -151,10 +151,11 @@ void Clone::set_snv_prior_map(){//either via BAF or else via CNA
snv_prior_from_cna_map = gsl_matrix_alloc( maxtcn+1, maxtcn+1);
}
gsl_matrix_set_zero( snv_prior_from_cna_map);
double p = snvEmit->connect ? 1.0 : 0.5;// penalty for high genotypes
double pen = snvEmit->connect ? 1.0 : snv_pen_high;// penalty for high genotypes
for (int cn=0; cn <= maxtcn; cn++){
for (int i=0; i <= cn; i++){
gsl_matrix_set( snv_prior_from_cna_map, i, cn, pow(p,i));
gsl_matrix_set( snv_prior_from_cna_map, 0, cn, pen);
for (int i=1; i<=cn; i++){
gsl_matrix_set( snv_prior_from_cna_map, i, cn, pow(pen,i));
}
//normalize...
gsl_vector_view col = gsl_matrix_column( snv_prior_from_cna_map, cn);
Expand Down Expand Up @@ -223,7 +224,6 @@ void Clone::get_snv_prior_from_cna_baf_post(gsl_vector * prior, gsl_vector * cna
gsl_vector_view prpc;
gsl_vector * bafppc = gsl_vector_alloc(maxtcn+1);
for (int i=0; i<nClones; i++){
//bafppc = gsl_vector_subvector( bafpostpc, i*(maxcn+1), maxcn+1);
prpc = gsl_matrix_row( snv_prpc, i);
for (int cn=0; cn<=maxtcn; cn++){
gsl_vector_set_zero(bafppc);
Expand All @@ -232,6 +232,7 @@ void Clone::get_snv_prior_from_cna_baf_post(gsl_vector * prior, gsl_vector * cna
bafppc->data[j] = gsl_vector_get( bafpostpc, i*(maxtcn+1)+j) + 1.0e-10;
norm += bafppc->data[j];
}
if (norm <= 0.0) abort();
gsl_vector_scale(bafppc,1.0/norm);
double pcna = gsl_vector_get( cnapostpc, i*(maxtcn+1) + cn);
gsl_blas_dgemv( CblasNoTrans, pcna, snv_prior_from_cna_baf_map[cn], bafppc, 1.0, &prpc.vector);
Expand All @@ -246,7 +247,7 @@ void Clone::get_snv_prior_from_cna_baf_post(gsl_vector * prior, gsl_vector * cna


//CNA (+BAF) + SNV mode...
void Clone::apply_snv_prpc( gsl_vector * prior, gsl_matrix * snv_prpc, double pc0){
void Clone::apply_snv_prpc( gsl_vector * prior, gsl_matrix * snv_prpc, double pzero){
gsl_vector_set_all( prior, 1.0);
for (int level=0; level<nLevels; level++){
for (int j=0; j<nClones; j++){
Expand All @@ -257,12 +258,12 @@ void Clone::apply_snv_prpc( gsl_vector * prior, gsl_matrix * snv_prpc, double p
if ( !snvEmit->connect ){
prior->data[0] = 0.0;
double norm = gsl_blas_dasum(prior);
if (norm > 0.0 ) gsl_vector_scale(prior, (1.0-snv_fpr)*(1.0-pc0) / norm);
prior->data[0] = 1.0 - (1.0-snv_fpr)*(1.0-pc0);//SNV false positive rate
if (norm > 0.0 ) gsl_vector_scale(prior, (1.0-snv_fpr)*(1.0-pzero) / norm);
prior->data[0] = 1.0 - (1.0-snv_fpr)*(1.0-pzero);//SNV false positive rate
}
else{
double norm = gsl_blas_dasum(prior);
if (norm <=0.0 || norm!= norm) abort();
if (norm <=0.0 || norm != norm) abort();
gsl_vector_scale(prior, 1.0 / norm);
}
//log-transform?
Expand Down Expand Up @@ -382,20 +383,11 @@ void Clone::get_snv_prior_from_av_cn(gsl_vector * prior, int sample, int evt){
for (int l=1; l<nLevels; l++){
found=0;
prior->data[l] = (snv_prior[snvChr])->data[l];
/*for (int j=0; j<nClones; j++){//genotype bigger than maximum?
if ( copynumber[l][j] > maxtcn_per_clone[snvChr][j]){
prior->data[l] = 0.0;
found = 1;
break;
}
}
if (found) continue;
*/
for (int t=0;t<nTimes;t++){//genotype not available?
for (int cn=0; cn<=maxtcn; cn++){
if (cn_usage[t][cn][l] > snvEmit->av_cn[t][sample][evt][cn]){
found=1;
prior->data[l] *= snv_pen;
found = 1;
prior->data[l] *= snv_pen_mult;//multiple hit SNV
break;
}
if (found) break;
Expand Down
9 changes: 4 additions & 5 deletions src/clone.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ Clone::Clone(){
nTimes = 0;
nClones = 0;
nLevels = 0;
//
freqs = NULL;
purity = NULL;
min_purity = NULL;
Expand All @@ -24,7 +23,7 @@ Clone::Clone(){
baf_prior_map = NULL;
snv_prior_from_cna_baf_map = NULL;
snv_prior_from_cna_map = NULL;
maxtcn = 4;
maxtcn = 4;
logn_set = 0;
nFreq = 0;
mass = NULL;
Expand All @@ -35,7 +34,6 @@ Clone::Clone(){
copynumber = NULL;
copynumber_post = NULL;
initial_snv_prior_param= NULL;
pinit = 0.5;
learn_priors = 0;
cn_usage = NULL;
clone_spectrum = NULL;
Expand All @@ -44,8 +42,9 @@ Clone::Clone(){
cna_pen_norm = 1.0;
cna_pen_zero = 1.0;
cna_pen_diff = 1.0;
baf_pen = 1.0;
snv_pen = 0.01;
baf_pen_comp = 1.0;
snv_pen_mult = 0.01;
snv_pen_high = 0.5;
snv_fpr = 1.0e-4;
// pre-computed
bafEmitLog = NULL;
Expand Down
4 changes: 2 additions & 2 deletions src/clone.h
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,8 @@ class Clone{
void set_tcn();
// penalties and parameters
double cna_pen_zero, cna_pen_norm, cna_pen_diff;
double baf_pen,snv_pen;
double baf_pen_comp;
double snv_pen_high, snv_pen_mult;
double snv_fpr,snv_fpf;
// pre computed variables (consider unordered_map<>)
map< unsigned int, double> logn;
Expand Down Expand Up @@ -125,7 +126,6 @@ class Clone{
void set_snv_prior( gsl_matrix * prior_param );
gsl_matrix * initial_snv_prior_param;
void initialize_snv_prior_param();
double pinit;
// mean total c.n. and available c.n.
void get_mean_tcn(int sample);//for cna only
void map_mean_tcn( Emission * fromEmit, int from_sample, Emission * toEmit);//from cna/baf to baf/snv
Expand Down
4 changes: 2 additions & 2 deletions src/cloneHD-functions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -372,10 +372,10 @@ void get_avail_cn( const char * avcn_fn, Clone * myClone, Emission * myEmit){
string mode;
while (line_ss.good()) line_ss >> mode;
if (mode.compare("cna")==0){
myClone->pinit = 0.5;
myClone->snv_pen_high = 0.5;
}
else if (mode.compare("baf")==0){
myClone->pinit = 1.0;
myClone->snv_pen_high = 1.0;
}
else{
abort();
Expand Down
5 changes: 3 additions & 2 deletions src/cloneHD-functions.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ struct cmdl_opts{
double cna_shape, baf_shape, snv_shape;
double cna_rnd, baf_rnd, snv_rnd;
double cna_pen_zero, cna_pen_norm, cna_pen_diff;
double baf_pen, snv_pen;
double baf_pen_comp;
double snv_pen_high, snv_pen_mult;
double snv_fpr, snv_fpf;
double bulk_fix, bulk_sigma, bulk_rnd;
double min_occ,min_jump;
Expand All @@ -63,7 +64,7 @@ void get_opts( int argc, const char ** argv, cmdl_opts& opts);
void read_opts( const char * opts_fn, cmdl_opts& opts);
void default_opts(cmdl_opts& opts);
void test_opts(cmdl_opts& opts);
void print_opts();
void print_usage();

void print_all_results( Clone * myClone, cmdl_opts& opts);
void print_posterior_header( FILE * fp, Clone * myClone, Emission * myEmit, cmdl_opts& opts);
Expand Down
Loading

0 comments on commit 24eb016

Please sign in to comment.