Merge pull request #13 from Zhong-Lab-UCSD/dev

v1.2 updates
Zhong-Lab-UCSD · Oct 1, 2019 · 7338931 · 7338931
2 parents 3ca8b56 + 7c9c61c
commit 7338931
Show file tree

Hide file tree

Showing 17 changed files with 204 additions and 107 deletions.
diff --git a/docs/build/doctrees/commandline_api.doctree b/docs/build/doctrees/commandline_api.doctree
diff --git a/docs/build/doctrees/environment.pickle b/docs/build/doctrees/environment.pickle
diff --git a/docs/build/doctrees/further_analysis.doctree b/docs/build/doctrees/further_analysis.doctree
diff --git a/docs/build/doctrees/step_by_step_illustration.doctree b/docs/build/doctrees/step_by_step_illustration.doctree
diff --git a/docs/build/html/_sources/commandline_api.md.txt b/docs/build/html/_sources/commandline_api.md.txt
@@ -18,9 +18,11 @@ We created several script tools. Here we show the usage and source code of all t
 
 ``` bash
     Usage: $PROGNAME [-r <ref_name>] [-N <base_name>] [-g <ref_fasta>]
-                     [-c <chromSize_file>] [-i <bwa_index>] [-R <restrict_sites>]
-                     [-G <max_inter_align_gap>] [-O <offset_restriction_site>] [-M <max_ligation_size>]
-                     [-t <threads>] [-1 <fastq.gz_R1>] [-2 <fastq.gz_R2>] [-o <output_dir>] 
+                     [-c <chromSize_file>] [-i <bwa_index>] [-R <restrict_sites>] 
+                     [-Q <min_mapq>] [-G <max_inter_align_gap>]
+                     [-O <offset_restriction_site>] [-M <max_ligation_size>] [-t <threads>]
+                     [-1 <fastq.gz_R1>] [-2 <fastq.gz_R2>]
+                     [-o <output_dir>]
 
     Dependency: seqtk, samtools, bwa, pairtools, pbgzip
 
@@ -34,6 +36,7 @@ We created several script tools. Here we show the usage and source code of all t
     -c : Chromosome size file.
     -i : bwa index
     -R : DNA restriction enzyme digestion fragments bed file.
+    -Q : Min MAPQ value for parsing as unique mapping. Default 1.
     -G : Max inter align gap for pairtools parsing. Default 20. It will allow R1 5' end clipping.
     -O : Max offset bases for filtering pairs based on R2 5' end positions to restriction sites. Default 3.
     -M : Max size of ligation fragment for sequencing. It's used for filtering unligated DNA sequence.
@@ -96,7 +99,7 @@ We created several script tools. Here we show the usage and source code of all t
 [*Source Code*](https://github.com/Zhong-Lab-UCSD/iMARGI-Docker/blob/master/src/imargi_parse.sh)
 
 ``` bash
-   Usage: $PROGNAME [-r <ref_name>] [-c <chromSize_file>] [-R <restrict_sites>] [-b <bam_file>] [-o <output_dir>] 
+    Usage: $PROGNAME [-r <ref_name>] [-c <chromSize_file>] [-R <restrict_sites>] [-b <bam_file>] [-o <output_dir>] 
                      [-Q <min_mapq>] [-G <max_inter_align_gap>] [-O <offset_restriction_site>] [-M <max_ligation_size>]
                      [-d <drop>] [-D <intermediate_dir>] [-t <threads>] 
 
@@ -112,8 +115,8 @@ We created several script tools. Here we show the usage and source code of all t
     -o : Output directoy
     -Q : Min MAPQ value, default 1.
     -G : Max inter align gap for pairtools parsing. Default 20. It will allow R1 5' end clipping.
-    -O : Max mis-offset bases for filtering pairs based on R2 5' end positions to restriction sites. Default 0.
-    -M : Max size of ligation fragment for sequencing. It's used for filtering unligated DNA sequence.
+    -O : Max mis-offset bases for filtering pairs based on R2 5' end positions to restriction sites. Default 3.
+    -M : Max size of ligation fragment for sequencing. It's used for filtering unligated DNA sequence. Default 1000.
     -d : Flag of dropping. Default is false, i.e., output all the intermediate results.
     -D : Directory for intermediate results. Works when -d false. Default is a sub-folder "intermediate_results" 
          in output directory.
@@ -173,19 +176,32 @@ We created several script tools. Here we show the usage and source code of all t
 [*Source Code*](https://github.com/Zhong-Lab-UCSD/iMARGI-Docker/blob/master/src/imargi_convert.sh)
 
 ``` bash
-    Usage: $PROGNAME [-f <file_format>] [-k <keep_cols>] [-b <bin_size>] [-i <input_file>] [-o <output_file>] 
+    Usage: $PROGNAME [-f <file_format>] [-k <keep_cols>] 
+                     [-b <bin_size>] [-r <resolution>] [-T <transpose>] 
+                     [-i <input_file>] [-o <output_file>] 
 
     Dependency: gzip, awk, cool
     This script can convert .pairs format to BEDPE, .cool, and GIVE interaction format.
-    -f : The target format, only accept 'cool', 'bedpe' and 'give'. For 'cool', it will generate
-         a ".cool" file with defined resolution of -b option and a multi-resolution ".mcool" file
-         based on the ".cool" file. For 'bedpe', the output will be pbgzip compressed file. So
-         keep in mind to name the output_file '-o' with '.gz' extesion.
-    -k : Keep extra information column in BEDPE. Columns ids in .pairs file you want to keep.
+    -f : The target format, only accept 'cool', 'bedpe' and 'give'. For 'cool', it will generate a ".cool" file
+         with defined resolution of -b option and a -r defined multi-resolution ".mcool" file based on the ".cool" file.
+         For 'bedpe', the output will be pbgzip compressed file. So keep in mind to name the output_file '-o' with
+         '.gz' extesion. For 'give', the output is a normal text file.
+    -k : (Only for BEDPE) Keep extra information column in BEDPE. Columns ids in .pairs file you want to keep.
          For example, 'cigar1,cigar2'. Default value is "", i.e., drop all extra cols.
-    -b : bin size for cool format. Default is 5000.
+    -b : (Only for cool/mcool) bin size for cool format. Default is 1000.
+    -r : (Only for cool/mcool) resolution for cool/mcool format. Integers separated by comma. The values of resolution
+         must be integer multiples of the bin size defined by -b option.
+         Default is 1000,2000,5000,10000,25000,50000,100000,250000,500000,1000000,2500000,5000000,10000000
+    -T : (Only for cool/mcool) mcool file can be visualized by HiGlass. Currently the heatmap orientation cannot be
+         set in HiGlass control panel. So if you want to transpose the interaction map in HiGlass, you need to generate
+         a transposed mcool file. The default value is 'fasle', i.e., no transpose, the RNA-DNA interactions will be
+         mapped to a X-Y system as a DNA x RNA contact matrix. If set '-T true', then the generated cool/mcool file is 
+         transposed, which is RNA x DNA contact matrix.
     -i : Input file.
-    -o : Output file. BEDPE output is gzip compressed file. cool output are .cool and .mcool files.
+    -o : Output file.
+         BEDPE output is gzip compressed file, so it's better to have a .gz file extension.
+         cool output are two files, .cool and .mcool. The -o option assigns the name of .cool file, it must use .cool as
+         extension. The .mcool file will be generated based on the .cool file with .mcool extension.
     -h : Show usage help
 ```
 

diff --git a/docs/build/html/_sources/further_analysis.md.txt b/docs/build/html/_sources/further_analysis.md.txt
@@ -137,7 +137,15 @@ convert .pairs file to .cool/.mcool file.
 Please read the [HiGlass documentation](https://github.com/higlass/higlass/wiki) to know how to use it. Besides, there
 is an Jupyter Notebook version of HiGlass, [jupyter-higlass](https://github.com/higlass/higlass-jupyter).
 
-![HiGLass view](./figures/higlass_view.png)
+**Note:** In the iMARGI `.pairs` file, coordinate of RNA is `c1:p1` and coordinate of DNA is `c2:p2`. We can directly
+generate `.mcool` file for HiGlass using `imargi_convert.sh` script. When HiGlass rendering the heatmap view from the
+`.mcool` file, it uses a X-Y coordinates system, where X is `c1:p1` and Y is `c2:p2`, so it will show a heatmap of
+DNA x RNA matrix, i.e., row is DNA and column is RNA (such as the figure below). Currently, if you want to transpose it,
+you have to generate a transposed `.mcool` file. Set `-T true` when you use `imargi_convert.sh` script. The HiGlass team
+will add "customizable transpose" function to its control panel in next update version, then you won't need to care
+about this.
+
+![HiGLass view (row is DNA and column is RNA)](./figures/higlass_view.png)
 
 ### GIVE
 

diff --git a/docs/build/html/_sources/step_by_step_illustration.md.txt b/docs/build/html/_sources/step_by_step_illustration.md.txt
@@ -402,14 +402,14 @@ Here we describe each line of the log file (TAB separated text file).
 Here is the output `pipelineStats_test_sample.log` file in our example.
 
 ``` 
-Sequence mapping QC passed
-(#unique_mapped_pairs + #single_side_unique_mapped)/#total_read_pairs   0.777777
-#total_valid_interactions/#unique_mapped_pairs  0.761476
-total_read_pairs    900000
-single_side_unique_mapped   2
-unique_mapped_pairs 699997
-non_dup_unique_mapped_paris 691967
-total_valid_interactions    533031
-inter_chr   244163
-intra_chr   288868
+Sequence mapping QC     passed
+(#unique_mapped_pairs + #single_side_unique_mapped)/#total_read_pairs   0.785859
+#total_valid_interactions/#nondup_unique_mapped_pairs   0.768195
+total_read_pairs        900000
+single_side_unique_mapped       3342
+unique_mapped_pairs     703931
+nondup_unique_mapped_pairs      694706
+total_valid_interactions        533670
+inter_chr       244208
+intra_chr       289462
 ```
diff --git a/docs/build/html/commandline_api.html b/docs/build/html/commandline_api.html
@@ -180,9 +180,11 @@ <h1>Command-line API<a class="headerlink" href="#command-line-api" title="Permal
 <h2>imargi_wrapper.sh<a class="headerlink" href="#imargi-wrapper-sh" title="Permalink to this headline">¶</a></h2>
 <p><a class="reference external" href="https://github.com/Zhong-Lab-UCSD/iMARGI-Docker/blob/master/src/imargi_wrapper.sh"><em>Source Code</em></a></p>
 <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>    Usage: $PROGNAME [-r &lt;ref_name&gt;] [-N &lt;base_name&gt;] [-g &lt;ref_fasta&gt;]
-                     [-c &lt;chromSize_file&gt;] [-i &lt;bwa_index&gt;] [-R &lt;restrict_sites&gt;]
-                     [-G &lt;max_inter_align_gap&gt;] [-O &lt;offset_restriction_site&gt;] [-M &lt;max_ligation_size&gt;]
-                     [-t &lt;threads&gt;] [-1 &lt;fastq.gz_R1&gt;] [-2 &lt;fastq.gz_R2&gt;] [-o &lt;output_dir&gt;] 
+                     [-c &lt;chromSize_file&gt;] [-i &lt;bwa_index&gt;] [-R &lt;restrict_sites&gt;] 
+                     [-Q &lt;min_mapq&gt;] [-G &lt;max_inter_align_gap&gt;]
+                     [-O &lt;offset_restriction_site&gt;] [-M &lt;max_ligation_size&gt;] [-t &lt;threads&gt;]
+                     [-1 &lt;fastq.gz_R1&gt;] [-2 &lt;fastq.gz_R2&gt;]
+                     [-o &lt;output_dir&gt;]
 
     Dependency: seqtk, samtools, bwa, pairtools, pbgzip
 
@@ -196,6 +198,7 @@ <h2>imargi_wrapper.sh<a class="headerlink" href="#imargi-wrapper-sh" title="Perm
     -c : Chromosome size file.
     -i : bwa index
     -R : DNA restriction enzyme digestion fragments bed file.
+    -Q : Min MAPQ value for parsing as unique mapping. Default 1.
     -G : Max inter align gap for pairtools parsing. Default 20. It will allow R1 5&#39; end clipping.
     -O : Max offset bases for filtering pairs based on R2 5&#39; end positions to restriction sites. Default 3.
     -M : Max size of ligation fragment for sequencing. It&#39;s used for filtering unligated DNA sequence.
@@ -255,7 +258,7 @@ <h2>imargi_rsfrags.sh<a class="headerlink" href="#imargi-rsfrags-sh" title="Perm
 <div class="section" id="imargi-parse-sh">
 <h2>imargi_parse.sh<a class="headerlink" href="#imargi-parse-sh" title="Permalink to this headline">¶</a></h2>
 <p><a class="reference external" href="https://github.com/Zhong-Lab-UCSD/iMARGI-Docker/blob/master/src/imargi_parse.sh"><em>Source Code</em></a></p>
-<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>   Usage: $PROGNAME [-r &lt;ref_name&gt;] [-c &lt;chromSize_file&gt;] [-R &lt;restrict_sites&gt;] [-b &lt;bam_file&gt;] [-o &lt;output_dir&gt;] 
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>    Usage: $PROGNAME [-r &lt;ref_name&gt;] [-c &lt;chromSize_file&gt;] [-R &lt;restrict_sites&gt;] [-b &lt;bam_file&gt;] [-o &lt;output_dir&gt;] 
                      [-Q &lt;min_mapq&gt;] [-G &lt;max_inter_align_gap&gt;] [-O &lt;offset_restriction_site&gt;] [-M &lt;max_ligation_size&gt;]
                      [-d &lt;drop&gt;] [-D &lt;intermediate_dir&gt;] [-t &lt;threads&gt;] 
 
@@ -271,8 +274,8 @@ <h2>imargi_parse.sh<a class="headerlink" href="#imargi-parse-sh" title="Permalin
     -o : Output directoy
     -Q : Min MAPQ value, default 1.
     -G : Max inter align gap for pairtools parsing. Default 20. It will allow R1 5&#39; end clipping.
-    -O : Max mis-offset bases for filtering pairs based on R2 5&#39; end positions to restriction sites. Default 0.
-    -M : Max size of ligation fragment for sequencing. It&#39;s used for filtering unligated DNA sequence.
+    -O : Max mis-offset bases for filtering pairs based on R2 5&#39; end positions to restriction sites. Default 3.
+    -M : Max size of ligation fragment for sequencing. It&#39;s used for filtering unligated DNA sequence. Default 1000.
     -d : Flag of dropping. Default is false, i.e., output all the intermediate results.
     -D : Directory for intermediate results. Works when -d false. Default is a sub-folder &quot;intermediate_results&quot; 
          in output directory.
@@ -329,20 +332,33 @@ <h2>imargi_distfilter.sh<a class="headerlink" href="#imargi-distfilter-sh" title
 <div class="section" id="imargi-convert-sh">
 <h2>imargi_convert.sh<a class="headerlink" href="#imargi-convert-sh" title="Permalink to this headline">¶</a></h2>
 <p><a class="reference external" href="https://github.com/Zhong-Lab-UCSD/iMARGI-Docker/blob/master/src/imargi_convert.sh"><em>Source Code</em></a></p>
-<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>    Usage: <span class="nv">$PROGNAME</span> <span class="o">[</span>-f &lt;file_format&gt;<span class="o">]</span> <span class="o">[</span>-k &lt;keep_cols&gt;<span class="o">]</span> <span class="o">[</span>-b &lt;bin_size&gt;<span class="o">]</span> <span class="o">[</span>-i &lt;input_file&gt;<span class="o">]</span> <span class="o">[</span>-o &lt;output_file&gt;<span class="o">]</span> 
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>    Usage: $PROGNAME [-f &lt;file_format&gt;] [-k &lt;keep_cols&gt;] 
+                     [-b &lt;bin_size&gt;] [-r &lt;resolution&gt;] [-T &lt;transpose&gt;] 
+                     [-i &lt;input_file&gt;] [-o &lt;output_file&gt;] 
 
     Dependency: gzip, awk, cool
     This script can convert .pairs format to BEDPE, .cool, and GIVE interaction format.
-    -f : The target format, only accept <span class="s1">&#39;cool&#39;</span>, <span class="s1">&#39;bedpe&#39;</span> and <span class="s1">&#39;give&#39;</span>. For <span class="s1">&#39;cool&#39;</span>, it will generate
-         a <span class="s2">&quot;.cool&quot;</span> file with defined resolution of -b option and a multi-resolution <span class="s2">&quot;.mcool&quot;</span> file
-         based on the <span class="s2">&quot;.cool&quot;</span> file. For <span class="s1">&#39;bedpe&#39;</span>, the output will be pbgzip compressed file. So
-         keep in mind to name the output_file <span class="s1">&#39;-o&#39;</span> with <span class="s1">&#39;.gz&#39;</span> extesion.
-    -k : Keep extra information column in BEDPE. Columns ids in .pairs file you want to keep.
-         For example, <span class="s1">&#39;cigar1,cigar2&#39;</span>. Default value is <span class="s2">&quot;&quot;</span>, i.e., drop all extra cols.
-    -b : bin size <span class="k">for</span> cool format. Default is <span class="m">5000</span>.
+    -f : The target format, only accept &#39;cool&#39;, &#39;bedpe&#39; and &#39;give&#39;. For &#39;cool&#39;, it will generate a &quot;.cool&quot; file
+         with defined resolution of -b option and a -r defined multi-resolution &quot;.mcool&quot; file based on the &quot;.cool&quot; file.
+         For &#39;bedpe&#39;, the output will be pbgzip compressed file. So keep in mind to name the output_file &#39;-o&#39; with
+         &#39;.gz&#39; extesion. For &#39;give&#39;, the output is a normal text file.
+    -k : (Only for BEDPE) Keep extra information column in BEDPE. Columns ids in .pairs file you want to keep.
+         For example, &#39;cigar1,cigar2&#39;. Default value is &quot;&quot;, i.e., drop all extra cols.
+    -b : (Only for cool/mcool) bin size for cool format. Default is 1000.
+    -r : (Only for cool/mcool) resolution for cool/mcool format. Integers separated by comma. The values of resolution
+         must be integer multiples of the bin size defined by -b option.
+         Default is 1000,2000,5000,10000,25000,50000,100000,250000,500000,1000000,2500000,5000000,10000000
+    -T : (Only for cool/mcool) mcool file can be visualized by HiGlass. Currently the heatmap orientation cannot be
+         set in HiGlass control panel. So if you want to transpose the interaction map in HiGlass, you need to generate
+         a transposed mcool file. The default value is &#39;fasle&#39;, i.e., no transpose, the RNA-DNA interactions will be
+         mapped to a X-Y system as a DNA x RNA contact matrix. If set &#39;-T true&#39;, then the generated cool/mcool file is 
+         transposed, which is RNA x DNA contact matrix.
     -i : Input file.
-    -o : Output file. BEDPE output is gzip compressed file. cool output are .cool and .mcool files.
-    -h : Show usage <span class="nb">help</span>
+    -o : Output file.
+         BEDPE output is gzip compressed file, so it&#39;s better to have a .gz file extension.
+         cool output are two files, .cool and .mcool. The -o option assigns the name of .cool file, it must use .cool as
+         extension. The .mcool file will be generated based on the .cool file with .mcool extension.
+    -h : Show usage help
 </pre></div>
 </div>
 </div>

diff --git a/docs/build/html/further_analysis.html b/docs/build/html/further_analysis.html
@@ -295,7 +295,14 @@ <h3>HiGlass<a class="headerlink" href="#higlass" title="Permalink to this headli
 convert .pairs file to .cool/.mcool file.
 Please read the <a class="reference external" href="https://github.com/higlass/higlass/wiki">HiGlass documentation</a> to know how to use it. Besides, there
 is an Jupyter Notebook version of HiGlass, <a class="reference external" href="https://github.com/higlass/higlass-jupyter">jupyter-higlass</a>.</p>
-<p><img alt="_images/higlass_view.png" src="_images/higlass_view.png" />HiGLass view</p>
+<p><strong>Note:</strong> In the iMARGI <code class="docutils literal notranslate"><span class="pre">.pairs</span></code> file, coordinate of RNA is <code class="docutils literal notranslate"><span class="pre">c1:p1</span></code> and coordinate of DNA is <code class="docutils literal notranslate"><span class="pre">c2:p2</span></code>. We can directly
+generate <code class="docutils literal notranslate"><span class="pre">.mcool</span></code> file for HiGlass using <code class="docutils literal notranslate"><span class="pre">imargi_convert.sh</span></code> script. When HiGlass rendering the heatmap view from the
+<code class="docutils literal notranslate"><span class="pre">.mcool</span></code> file, it uses a X-Y coordinates system, where X is <code class="docutils literal notranslate"><span class="pre">c1:p1</span></code> and Y is <code class="docutils literal notranslate"><span class="pre">c2:p2</span></code>, so it will show a heatmap of
+DNA x RNA matrix, i.e., row is DNA and column is RNA (such as the figure below). Currently, if you want to transpose it,
+you have to generate a transposed <code class="docutils literal notranslate"><span class="pre">.mcool</span></code> file. Set <code class="docutils literal notranslate"><span class="pre">-T</span> <span class="pre">true</span></code> when you use <code class="docutils literal notranslate"><span class="pre">imargi_convert.sh</span></code> script. The HiGlass team
+will add “customizable transpose” function to its control panel in next update version, then you won’t need to care
+about this.</p>
+<p><img alt="_images/higlass_view.png" src="_images/higlass_view.png" />HiGLass view (row is DNA and column is RNA)</p>
 </div>
 <div class="section" id="give">
 <h3>GIVE<a class="headerlink" href="#give" title="Permalink to this headline">¶</a></h3>

diff --git a/docs/build/html/searchindex.js b/docs/build/html/searchindex.js