From 57bf037324d423eeee4dd35fec9bfce4122afac0 Mon Sep 17 00:00:00 2001
From: Benjamin James <benjamin-james@utulsa.edu>
Date: Wed, 24 Apr 2019 09:37:36 -0500
Subject: [PATCH] Fixed README.md

---
 README.md | 142 +++++++++++++++++++++++++++---------------------------
 1 file changed, 72 insertions(+), 70 deletions(-)

diff --git a/README.md b/README.md
index e099f63..24fa0ca 100644
--- a/README.md
+++ b/README.md
@@ -6,9 +6,11 @@ g++ 4.9.1 or later, requires Homebrew on Mac OS X
 Compilation using g++ (homebrew) and CMake on Mac OS X see [this link](https://stackoverflow.com/questions/29057437/compile-openmp-programs-with-gcc-compiler-on-os-x-yosemite)
 
 ### Linux/Unix compilation
-> mkdir bin && cd bin
-> cmake ..
-> make
+``` sh
+mkdir bin && cd bin
+cmake ..
+make
+```
 
 ### Citation
 If you find this tool helpful, please cite:
@@ -17,94 +19,94 @@ If you find this tool helpful, please cite:
 
 ### Usage
 
-  Usage: meshclust2 --id 0.x [OPTIONS] *.fasta
+    Usage: meshclust2 --id 0.x [OPTIONS] *.fasta
 
-  --id          The most important parameter, --id, controls the identity cutoff of the sequences.
-                Needs to be between 0 and 1.
-                If it is not specified, an identity of 0.9 is used.
+    --id          The most important parameter, --id, controls the identity cutoff of the sequences.
+                  Needs to be between 0 and 1.
+                  If it is not specified, an identity of 0.9 is used.
 
-  --kmer        decides the size of the kmers. It is by default automatically decided by average sequence
-                length, but if provided, MeShClust can speed up a little by not having to find the largest
-                sequence length. Increasing kmer size can increase accuracy, but increases memory consumption.
+    --kmer        decides the size of the kmers. It is by default automatically decided by average sequence
+                  length, but if provided, MeShClust can speed up a little by not having to find the largest
+                  sequence length. Increasing kmer size can increase accuracy, but increases memory consumption.
 
-  --dump       Run until the classifier is trained, and then dump the weights to the file,
-               default 'weights.txt'. Can be used with --recover to recover the weights
-               instead of re-training.
+    --dump       Run until the classifier is trained, and then dump the weights to the file,
+                 default 'weights.txt'. Can be used with --recover to recover the weights
+                 instead of re-training.
 
-  --recover    Recover weights for the classifier trained by a previous run which used --dump to dump
-               the weights.
+    --recover    Recover weights for the classifier trained by a previous run which used --dump to dump
+                 the weights.
 
-  --list       Instead of specifying files as extra arguments, provide a text file with
-               a list of files. Can use pipes or process substitutions such as "--list <(ls *.fasta) "
+    --list       Instead of specifying files as extra arguments, provide a text file with
+                 a list of files. Can use pipes or process substitutions such as "--list <(ls *.fasta) "
 
-  --no-train-list    Same as --list, but these files are not passed to the classifier,
-                     e.g. unassembled genomes
+    --no-train-list    Same as --list, but these files are not passed to the classifier,
+                       e.g. unassembled genomes
 
-  --mut-type   {single, both, nonsingle-typical, nonsingle-all, all-but-reversion, all-but-translocation}
-               changes the mutation generation algorithm. By default, "both" is used, utilizing
-               single point and block mutations. On higher identity data sets, "single", which includes only single point mutations,
-               is preferable. The option "nonsingle-typical" uses only block mutations,
-               disallowing single point mutations. Other options include "all", which includes single,
-               block, and nontypical mutations translocation and reversion.
+    --mut-type   {single, both, nonsingle-typical, nonsingle-all, all-but-reversion, all-but-translocation}
+                 changes the mutation generation algorithm. By default, "both" is used, utilizing
+                 single point and block mutations. On higher identity data sets, "single", which includes only single point mutations,
+                 is preferable. The option "nonsingle-typical" uses only block mutations,
+                 disallowing single point mutations. Other options include "all", which includes single,
+                 block, and nontypical mutations translocation and reversion.
 
-  --feat       determines the combinations of features to be used. By default, "slow" allows 11
-               combinations to be selected from. "fast" removes 2 slower features from "slow"
-               which include logarithm based features.
+    --feat       determines the combinations of features to be used. By default, "slow" allows 11
+                 combinations to be selected from. "fast" removes 2 slower features from "slow"
+                 which include logarithm based features.
 
-  --single-file  Using this option, (no value is needed), each file is treated as a single sequence.
-                 If multiple sequences in a file are encountered, they are joined with 50 Ns,
-                 and the k-mers are not counted in that region.
-                 However, to be most accurate, it is advised to not use these sequences in the
-                 training step (for mutations) and instead 1) train using un-joined sequences and
-                 use --dump to dump to a file, and 2) use --recover with --single-file for the
-                 file list.
+    --single-file  Using this option, (no value is needed), each file is treated as a single sequence.
+                   If multiple sequences in a file are encountered, they are joined with 50 Ns,
+                   and the k-mers are not counted in that region.
+                   However, to be most accurate, it is advised to not use these sequences in the
+                   training step (for mutations) and instead 1) train using un-joined sequences and
+                   use --dump to dump to a file, and 2) use --recover with --single-file for the
+                   file list.
 
-  --sample     selects the total number of sequences used for both training and testing.
-               2000 is the default value. That is, --sample 2000 provides 2000 training
-               pairs and 2000 testing pairs.
+    --sample     selects the total number of sequences used for both training and testing.
+                 2000 is the default value. That is, --sample 2000 provides 2000 training
+                 pairs and 2000 testing pairs.
 
-  --num-templates   selects the number of "template" sequences from which to mutate.
-               For example, if 300 (the default) templates are requested, and the number of
-               "samples" is requested to be 2000 (the default), 300 sequences will be read in
-               and mutated 2000/300 times each to create 2000 semi-synthetic pairs.
+    --num-templates   selects the number of "template" sequences from which to mutate.
+                 For example, if 300 (the default) templates are requested, and the number of
+                 "samples" is requested to be 2000 (the default), 300 sequences will be read in
+                 and mutated 2000/300 times each to create 2000 semi-synthetic pairs.
 
-  --min-feat   (default 4) sets the minimum feature pairs to be used. If set to 2, at least 2 feature pairs
-               will be used. Recall that features include pairwise combinations of the "feat" option.
+    --min-feat   (default 4) sets the minimum feature pairs to be used. If set to 2, at least 2 feature pairs
+                 will be used. Recall that features include pairwise combinations of the "feat" option.
 
-  --max-feat   (default 4) sets the maximum feature pairs to be used. Diminishing returns appears quickly,
-               so a very large maximum (>10) is not advised.
+    --max-feat   (default 4) sets the maximum feature pairs to be used. Diminishing returns appears quickly,
+                 so a very large maximum (>10) is not advised.
 
-  --min-id     (default 0.35) sets the lower bound for mutation identity scores to be calculated.
-               Shouldn't need to be set normally, as lower identites take much longer,
-               especially with single mutations only.
+    --min-id     (default 0.35) sets the lower bound for mutation identity scores to be calculated.
+                 Shouldn't need to be set normally, as lower identites take much longer,
+                 especially with single mutations only.
 
-  --datatype   (8,16,32,64) Decides the integer size of the histograms. If not provided,
-               all sequences are read in and counted to ensure the largest k-mer does not
-               overflow. If the provided k-mer is too small, it will overflow.
+    --datatype   (8,16,32,64) Decides the integer size of the histograms. If not provided,
+                 all sequences are read in and counted to ensure the largest k-mer does not
+                 overflow. If the provided k-mer is too small, it will overflow.
 
-  --threads    sets the number of threads to be used. By default OpenMP uses the number of available cores
-               on your machine, but this parameter overwrites that.
+    --threads    sets the number of threads to be used. By default OpenMP uses the number of available cores
+                 on your machine, but this parameter overwrites that.
 
-  --output     specifies the output file, in CD-HIT's CLSTR format, described below:
-               A '>Cluster ' followed by an increasing index designates a cluster.
-               Otherwise, the sequence is printed out.
-               A '*' at the end of a sequence designates the center of the cluster.
-               An example of a small data set:
+    --output     specifies the output file, in CD-HIT's CLSTR format, described below:
+                 A '>Cluster ' followed by an increasing index designates a cluster.
+                 Otherwise, the sequence is printed out.
+                 A '*' at the end of a sequence designates the center of the cluster.
+                 An example of a small data set:
 
-               >Cluster 0
-               0       993nt, >seq128 template_6... *
-               >Cluster 1
-               0       1043nt, >seq235 template_10...
-               1       1000nt, >seq216 template_10... *
-               2       1015nt, >seq237 template_10...
+                 >Cluster 0
+                 0       993nt, >seq128 template_6... *
+                 >Cluster 1
+                 0       1043nt, >seq235 template_10...
+                 1       1000nt, >seq216 template_10... *
+                 2       1015nt, >seq237 template_10...
 
-  --delta      decides how many clusters are looked around in the final clustering stage.
-               Increasing it creates more accuracy, but takes more time. Default value is 5.
+    --delta      decides how many clusters are looked around in the final clustering stage.
+                 Increasing it creates more accuracy, but takes more time. Default value is 5.
 
-  --iterations specifies how many iterations in the final stage of merging are done until convergence.
-               Default value is 15.
+    --iterations specifies how many iterations in the final stage of merging are done until convergence.
+                 Default value is 15.
 
-  If the argument is not listed here, it is interpreted as an input (FASTA format) file.
+    If the argument is not listed here, it is interpreted as an input (FASTA format) file.
 
 
 ### License