fixed bugs in pre-filter, thanks J.Gerling

ivazquez · Apr 24, 2014 · 1900b37 · 1900b37
1 parent 5f8b4d1
commit 1900b37
Showing 1 changed file with 25 additions and 17 deletions.
diff --git a/README.md b/README.md
@@ -24,6 +24,14 @@ in the `src` directory. The executables will be in `build`. For debugging with g
 
 To report bugs, use the [issue](https://github.com/andrej-fischer/cloneHD/issues) interface of github.
 
+# Full documentation
+
+The full documentation can be found in the `/docs/` subfolder. Click below.
+
+*  [pre-filter](/docs/README-pre-filter.md)
+*  [filterHD](/docs/README-filterHD.md)
+*  [cloneHD](/docs/README-cloneHD.md)
+
 # What are cloneHD and filterHD for?
 
 cloneHD is a software for reconstructing the subclonal structure of a
@@ -59,44 +67,44 @@ prediction (red).
 (vii) The observed SNV frequencies, corrected for local ploidy, and per genotype (SNVs are assigned ramdomly according to the cloneHD SNV posterior).
 (All plots are created with Wolfram [Mathematica](http://www.wolfram.com/mathematica/).)
 
-# Full documentation
-
-The full documentation can be found in the `/docs/` subfolder. Click below.
-
-*  [pre-filter](/docs/README-pre-filter.md)
-*  [filterHD](/docs/README-filterHD.md)
-*  [cloneHD](/docs/README-cloneHD.md)
-
 # Tips and tricks
 
+*  Note: all input files are assumed to be sorted by genomic coordinate. With Unix, this
+   can be guaranteed with `sort -k1n,1 -k2n,2 file.txt > sorted-file.txt`.
+
 *  Pre-filtering of data can be very important. If filterHD predicts
    many more jumps than you would expect, it might be necessary to
    pre-filter the data, removing variable regions, outliers or very short 
-   segments (use programs `pre-filter` and `filterHD`).
+   segments (use programs the `pre-filter` and `filterHD`).
 
 *  Make sure that the bias field for the tumor CNA data is
    meaningful. If a matched normal sample was sequenced with the same
    pipeline, its read depth profile, as predicted by filterHD, can be used as a
    bias field for the tumor CNA data. Follow the logic of the example
    given here.
 
-*  If the matched-normal sample was sequenced at lower coverage than the tumor sample, it might be necessary to run filterHD with a higher-than-optimal diffusion constant (set with `--sigma [double]`) to obtain a more faithful bias field. Otherwise, the filterHD solution is too stiff and you loose bias detail.
+*  If the matched-normal sample was sequenced at lower coverage than the tumor sample, 
+   it might be necessary to run filterHD with a higher-than-optimal diffusion constant 
+   (set with `--sigma [double]`) to obtain a more faithful bias field. Otherwise, the 
+   filterHD solution is too stiff and you loose bias detail.
 
 *  filterHD can sometimes run into local optima. In this case, it might be useful to
    set initial values for the parameters via `--jumpi [double]` etc.
 
-*  By default, cloneHD runs with mass gauging enabled. This seems like
-   an overkill, but is actually quite useful because you can see some
-   alternative explanations during the course of the analysis.
+*  By default, cloneHD runs with mass-gauging enabled. This seems wasteful,
+   but is actually quite useful because you can see some alternative explanations
+   during the course of the analysis.
 
 *  Don't put too much weight on the BIC criterion. It was calibrated
    using simulated data. For real data, it should be supplemented with
    common sense and biological knowledge. Use `--force [int]` to use a
    fixed number of subclones and `--max-tcn [int]` to set the maximum possible total
    copy number.
 
-*  For exome sequencing data, the read depth bias can be enormous. The filterHD estimate 
-   of the bias field might not be useful, especially in segmenting the data.
-   Use rather, if available, the jumps seen in the BAF data for both CNA and BAF.
-
+*  If high copy numbers are expected only in a few chromosomes, you can increase performance
+   by using the `--max-tcn [file]` option to specify per-chromosome upper limits.
 
+*  For exome sequencing data, the read depth bias can be enormous. The filterHD estimate 
+   of the bias field might not be very useful, especially in segmenting the tumor data.
+   Use rather, if available, the jumps seen in the BAF data for both CNA and BAF data
+   (give the BAF jumps file to both `--cna-jumps` and `--baf-jumps`).