Skip to content

Commit

Permalink
Merge pull request #13 from Zhong-Lab-UCSD/dev
Browse files Browse the repository at this point in the history
v1.2 updates
  • Loading branch information
frankyan authored Oct 1, 2019
2 parents 3ca8b56 + 7c9c61c commit 7338931
Show file tree
Hide file tree
Showing 17 changed files with 204 additions and 107 deletions.
Binary file modified docs/build/doctrees/commandline_api.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/build/doctrees/further_analysis.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/step_by_step_illustration.doctree
Binary file not shown.
44 changes: 30 additions & 14 deletions docs/build/html/_sources/commandline_api.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,11 @@ We created several script tools. Here we show the usage and source code of all t

``` bash
Usage: $PROGNAME [-r <ref_name>] [-N <base_name>] [-g <ref_fasta>]
[-c <chromSize_file>] [-i <bwa_index>] [-R <restrict_sites>]
[-G <max_inter_align_gap>] [-O <offset_restriction_site>] [-M <max_ligation_size>]
[-t <threads>] [-1 <fastq.gz_R1>] [-2 <fastq.gz_R2>] [-o <output_dir>]
[-c <chromSize_file>] [-i <bwa_index>] [-R <restrict_sites>]
[-Q <min_mapq>] [-G <max_inter_align_gap>]
[-O <offset_restriction_site>] [-M <max_ligation_size>] [-t <threads>]
[-1 <fastq.gz_R1>] [-2 <fastq.gz_R2>]
[-o <output_dir>]

Dependency: seqtk, samtools, bwa, pairtools, pbgzip

Expand All @@ -34,6 +36,7 @@ We created several script tools. Here we show the usage and source code of all t
-c : Chromosome size file.
-i : bwa index
-R : DNA restriction enzyme digestion fragments bed file.
-Q : Min MAPQ value for parsing as unique mapping. Default 1.
-G : Max inter align gap for pairtools parsing. Default 20. It will allow R1 5' end clipping.
-O : Max offset bases for filtering pairs based on R2 5' end positions to restriction sites. Default 3.
-M : Max size of ligation fragment for sequencing. It's used for filtering unligated DNA sequence.
Expand Down Expand Up @@ -96,7 +99,7 @@ We created several script tools. Here we show the usage and source code of all t
[*Source Code*](https://github.com/Zhong-Lab-UCSD/iMARGI-Docker/blob/master/src/imargi_parse.sh)

``` bash
Usage: $PROGNAME [-r <ref_name>] [-c <chromSize_file>] [-R <restrict_sites>] [-b <bam_file>] [-o <output_dir>]
Usage: $PROGNAME [-r <ref_name>] [-c <chromSize_file>] [-R <restrict_sites>] [-b <bam_file>] [-o <output_dir>]
[-Q <min_mapq>] [-G <max_inter_align_gap>] [-O <offset_restriction_site>] [-M <max_ligation_size>]
[-d <drop>] [-D <intermediate_dir>] [-t <threads>]

Expand All @@ -112,8 +115,8 @@ We created several script tools. Here we show the usage and source code of all t
-o : Output directoy
-Q : Min MAPQ value, default 1.
-G : Max inter align gap for pairtools parsing. Default 20. It will allow R1 5' end clipping.
-O : Max mis-offset bases for filtering pairs based on R2 5' end positions to restriction sites. Default 0.
-M : Max size of ligation fragment for sequencing. It's used for filtering unligated DNA sequence.
-O : Max mis-offset bases for filtering pairs based on R2 5' end positions to restriction sites. Default 3.
-M : Max size of ligation fragment for sequencing. It's used for filtering unligated DNA sequence. Default 1000.
-d : Flag of dropping. Default is false, i.e., output all the intermediate results.
-D : Directory for intermediate results. Works when -d false. Default is a sub-folder "intermediate_results"
in output directory.
Expand Down Expand Up @@ -173,19 +176,32 @@ We created several script tools. Here we show the usage and source code of all t
[*Source Code*](https://github.com/Zhong-Lab-UCSD/iMARGI-Docker/blob/master/src/imargi_convert.sh)

``` bash
Usage: $PROGNAME [-f <file_format>] [-k <keep_cols>] [-b <bin_size>] [-i <input_file>] [-o <output_file>]
Usage: $PROGNAME [-f <file_format>] [-k <keep_cols>]
[-b <bin_size>] [-r <resolution>] [-T <transpose>]
[-i <input_file>] [-o <output_file>]

Dependency: gzip, awk, cool
This script can convert .pairs format to BEDPE, .cool, and GIVE interaction format.
-f : The target format, only accept 'cool', 'bedpe' and 'give'. For 'cool', it will generate
a ".cool" file with defined resolution of -b option and a multi-resolution ".mcool" file
based on the ".cool" file. For 'bedpe', the output will be pbgzip compressed file. So
keep in mind to name the output_file '-o' with '.gz' extesion.
-k : Keep extra information column in BEDPE. Columns ids in .pairs file you want to keep.
-f : The target format, only accept 'cool', 'bedpe' and 'give'. For 'cool', it will generate a ".cool" file
with defined resolution of -b option and a -r defined multi-resolution ".mcool" file based on the ".cool" file.
For 'bedpe', the output will be pbgzip compressed file. So keep in mind to name the output_file '-o' with
'.gz' extesion. For 'give', the output is a normal text file.
-k : (Only for BEDPE) Keep extra information column in BEDPE. Columns ids in .pairs file you want to keep.
For example, 'cigar1,cigar2'. Default value is "", i.e., drop all extra cols.
-b : bin size for cool format. Default is 5000.
-b : (Only for cool/mcool) bin size for cool format. Default is 1000.
-r : (Only for cool/mcool) resolution for cool/mcool format. Integers separated by comma. The values of resolution
must be integer multiples of the bin size defined by -b option.
Default is 1000,2000,5000,10000,25000,50000,100000,250000,500000,1000000,2500000,5000000,10000000
-T : (Only for cool/mcool) mcool file can be visualized by HiGlass. Currently the heatmap orientation cannot be
set in HiGlass control panel. So if you want to transpose the interaction map in HiGlass, you need to generate
a transposed mcool file. The default value is 'fasle', i.e., no transpose, the RNA-DNA interactions will be
mapped to a X-Y system as a DNA x RNA contact matrix. If set '-T true', then the generated cool/mcool file is
transposed, which is RNA x DNA contact matrix.
-i : Input file.
-o : Output file. BEDPE output is gzip compressed file. cool output are .cool and .mcool files.
-o : Output file.
BEDPE output is gzip compressed file, so it's better to have a .gz file extension.
cool output are two files, .cool and .mcool. The -o option assigns the name of .cool file, it must use .cool as
extension. The .mcool file will be generated based on the .cool file with .mcool extension.
-h : Show usage help
```

Expand Down
10 changes: 9 additions & 1 deletion docs/build/html/_sources/further_analysis.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,15 @@ convert .pairs file to .cool/.mcool file.
Please read the [HiGlass documentation](https://github.com/higlass/higlass/wiki) to know how to use it. Besides, there
is an Jupyter Notebook version of HiGlass, [jupyter-higlass](https://github.com/higlass/higlass-jupyter).

![HiGLass view](./figures/higlass_view.png)
**Note:** In the iMARGI `.pairs` file, coordinate of RNA is `c1:p1` and coordinate of DNA is `c2:p2`. We can directly
generate `.mcool` file for HiGlass using `imargi_convert.sh` script. When HiGlass rendering the heatmap view from the
`.mcool` file, it uses a X-Y coordinates system, where X is `c1:p1` and Y is `c2:p2`, so it will show a heatmap of
DNA x RNA matrix, i.e., row is DNA and column is RNA (such as the figure below). Currently, if you want to transpose it,
you have to generate a transposed `.mcool` file. Set `-T true` when you use `imargi_convert.sh` script. The HiGlass team
will add "customizable transpose" function to its control panel in next update version, then you won't need to care
about this.

![HiGLass view (row is DNA and column is RNA)](./figures/higlass_view.png)

### GIVE

Expand Down
20 changes: 10 additions & 10 deletions docs/build/html/_sources/step_by_step_illustration.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -402,14 +402,14 @@ Here we describe each line of the log file (TAB separated text file).
Here is the output `pipelineStats_test_sample.log` file in our example.

```
Sequence mapping QC passed
(#unique_mapped_pairs + #single_side_unique_mapped)/#total_read_pairs 0.777777
#total_valid_interactions/#unique_mapped_pairs 0.761476
total_read_pairs 900000
single_side_unique_mapped 2
unique_mapped_pairs 699997
non_dup_unique_mapped_paris 691967
total_valid_interactions 533031
inter_chr 244163
intra_chr 288868
Sequence mapping QC passed
(#unique_mapped_pairs + #single_side_unique_mapped)/#total_read_pairs 0.785859
#total_valid_interactions/#nondup_unique_mapped_pairs 0.768195
total_read_pairs 900000
single_side_unique_mapped 3342
unique_mapped_pairs 703931
nondup_unique_mapped_pairs 694706
total_valid_interactions 533670
inter_chr 244208
intra_chr 289462
```
48 changes: 32 additions & 16 deletions docs/build/html/commandline_api.html
Original file line number Diff line number Diff line change
Expand Up @@ -180,9 +180,11 @@ <h1>Command-line API<a class="headerlink" href="#command-line-api" title="Permal
<h2>imargi_wrapper.sh<a class="headerlink" href="#imargi-wrapper-sh" title="Permalink to this headline"></a></h2>
<p><a class="reference external" href="https://github.com/Zhong-Lab-UCSD/iMARGI-Docker/blob/master/src/imargi_wrapper.sh"><em>Source Code</em></a></p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span> Usage: $PROGNAME [-r &lt;ref_name&gt;] [-N &lt;base_name&gt;] [-g &lt;ref_fasta&gt;]
[-c &lt;chromSize_file&gt;] [-i &lt;bwa_index&gt;] [-R &lt;restrict_sites&gt;]
[-G &lt;max_inter_align_gap&gt;] [-O &lt;offset_restriction_site&gt;] [-M &lt;max_ligation_size&gt;]
[-t &lt;threads&gt;] [-1 &lt;fastq.gz_R1&gt;] [-2 &lt;fastq.gz_R2&gt;] [-o &lt;output_dir&gt;]
[-c &lt;chromSize_file&gt;] [-i &lt;bwa_index&gt;] [-R &lt;restrict_sites&gt;]
[-Q &lt;min_mapq&gt;] [-G &lt;max_inter_align_gap&gt;]
[-O &lt;offset_restriction_site&gt;] [-M &lt;max_ligation_size&gt;] [-t &lt;threads&gt;]
[-1 &lt;fastq.gz_R1&gt;] [-2 &lt;fastq.gz_R2&gt;]
[-o &lt;output_dir&gt;]

Dependency: seqtk, samtools, bwa, pairtools, pbgzip

Expand All @@ -196,6 +198,7 @@ <h2>imargi_wrapper.sh<a class="headerlink" href="#imargi-wrapper-sh" title="Perm
-c : Chromosome size file.
-i : bwa index
-R : DNA restriction enzyme digestion fragments bed file.
-Q : Min MAPQ value for parsing as unique mapping. Default 1.
-G : Max inter align gap for pairtools parsing. Default 20. It will allow R1 5&#39; end clipping.
-O : Max offset bases for filtering pairs based on R2 5&#39; end positions to restriction sites. Default 3.
-M : Max size of ligation fragment for sequencing. It&#39;s used for filtering unligated DNA sequence.
Expand Down Expand Up @@ -255,7 +258,7 @@ <h2>imargi_rsfrags.sh<a class="headerlink" href="#imargi-rsfrags-sh" title="Perm
<div class="section" id="imargi-parse-sh">
<h2>imargi_parse.sh<a class="headerlink" href="#imargi-parse-sh" title="Permalink to this headline"></a></h2>
<p><a class="reference external" href="https://github.com/Zhong-Lab-UCSD/iMARGI-Docker/blob/master/src/imargi_parse.sh"><em>Source Code</em></a></p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span> Usage: $PROGNAME [-r &lt;ref_name&gt;] [-c &lt;chromSize_file&gt;] [-R &lt;restrict_sites&gt;] [-b &lt;bam_file&gt;] [-o &lt;output_dir&gt;]
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span> Usage: $PROGNAME [-r &lt;ref_name&gt;] [-c &lt;chromSize_file&gt;] [-R &lt;restrict_sites&gt;] [-b &lt;bam_file&gt;] [-o &lt;output_dir&gt;]
[-Q &lt;min_mapq&gt;] [-G &lt;max_inter_align_gap&gt;] [-O &lt;offset_restriction_site&gt;] [-M &lt;max_ligation_size&gt;]
[-d &lt;drop&gt;] [-D &lt;intermediate_dir&gt;] [-t &lt;threads&gt;]

Expand All @@ -271,8 +274,8 @@ <h2>imargi_parse.sh<a class="headerlink" href="#imargi-parse-sh" title="Permalin
-o : Output directoy
-Q : Min MAPQ value, default 1.
-G : Max inter align gap for pairtools parsing. Default 20. It will allow R1 5&#39; end clipping.
-O : Max mis-offset bases for filtering pairs based on R2 5&#39; end positions to restriction sites. Default 0.
-M : Max size of ligation fragment for sequencing. It&#39;s used for filtering unligated DNA sequence.
-O : Max mis-offset bases for filtering pairs based on R2 5&#39; end positions to restriction sites. Default 3.
-M : Max size of ligation fragment for sequencing. It&#39;s used for filtering unligated DNA sequence. Default 1000.
-d : Flag of dropping. Default is false, i.e., output all the intermediate results.
-D : Directory for intermediate results. Works when -d false. Default is a sub-folder &quot;intermediate_results&quot;
in output directory.
Expand Down Expand Up @@ -329,20 +332,33 @@ <h2>imargi_distfilter.sh<a class="headerlink" href="#imargi-distfilter-sh" title
<div class="section" id="imargi-convert-sh">
<h2>imargi_convert.sh<a class="headerlink" href="#imargi-convert-sh" title="Permalink to this headline"></a></h2>
<p><a class="reference external" href="https://github.com/Zhong-Lab-UCSD/iMARGI-Docker/blob/master/src/imargi_convert.sh"><em>Source Code</em></a></p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span> Usage: <span class="nv">$PROGNAME</span> <span class="o">[</span>-f &lt;file_format&gt;<span class="o">]</span> <span class="o">[</span>-k &lt;keep_cols&gt;<span class="o">]</span> <span class="o">[</span>-b &lt;bin_size&gt;<span class="o">]</span> <span class="o">[</span>-i &lt;input_file&gt;<span class="o">]</span> <span class="o">[</span>-o &lt;output_file&gt;<span class="o">]</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span> Usage: $PROGNAME [-f &lt;file_format&gt;] [-k &lt;keep_cols&gt;]
[-b &lt;bin_size&gt;] [-r &lt;resolution&gt;] [-T &lt;transpose&gt;]
[-i &lt;input_file&gt;] [-o &lt;output_file&gt;]

Dependency: gzip, awk, cool
This script can convert .pairs format to BEDPE, .cool, and GIVE interaction format.
-f : The target format, only accept <span class="s1">&#39;cool&#39;</span>, <span class="s1">&#39;bedpe&#39;</span> and <span class="s1">&#39;give&#39;</span>. For <span class="s1">&#39;cool&#39;</span>, it will generate
a <span class="s2">&quot;.cool&quot;</span> file with defined resolution of -b option and a multi-resolution <span class="s2">&quot;.mcool&quot;</span> file
based on the <span class="s2">&quot;.cool&quot;</span> file. For <span class="s1">&#39;bedpe&#39;</span>, the output will be pbgzip compressed file. So
keep in mind to name the output_file <span class="s1">&#39;-o&#39;</span> with <span class="s1">&#39;.gz&#39;</span> extesion.
-k : Keep extra information column in BEDPE. Columns ids in .pairs file you want to keep.
For example, <span class="s1">&#39;cigar1,cigar2&#39;</span>. Default value is <span class="s2">&quot;&quot;</span>, i.e., drop all extra cols.
-b : bin size <span class="k">for</span> cool format. Default is <span class="m">5000</span>.
-f : The target format, only accept &#39;cool&#39;, &#39;bedpe&#39; and &#39;give&#39;. For &#39;cool&#39;, it will generate a &quot;.cool&quot; file
with defined resolution of -b option and a -r defined multi-resolution &quot;.mcool&quot; file based on the &quot;.cool&quot; file.
For &#39;bedpe&#39;, the output will be pbgzip compressed file. So keep in mind to name the output_file &#39;-o&#39; with
&#39;.gz&#39; extesion. For &#39;give&#39;, the output is a normal text file.
-k : (Only for BEDPE) Keep extra information column in BEDPE. Columns ids in .pairs file you want to keep.
For example, &#39;cigar1,cigar2&#39;. Default value is &quot;&quot;, i.e., drop all extra cols.
-b : (Only for cool/mcool) bin size for cool format. Default is 1000.
-r : (Only for cool/mcool) resolution for cool/mcool format. Integers separated by comma. The values of resolution
must be integer multiples of the bin size defined by -b option.
Default is 1000,2000,5000,10000,25000,50000,100000,250000,500000,1000000,2500000,5000000,10000000
-T : (Only for cool/mcool) mcool file can be visualized by HiGlass. Currently the heatmap orientation cannot be
set in HiGlass control panel. So if you want to transpose the interaction map in HiGlass, you need to generate
a transposed mcool file. The default value is &#39;fasle&#39;, i.e., no transpose, the RNA-DNA interactions will be
mapped to a X-Y system as a DNA x RNA contact matrix. If set &#39;-T true&#39;, then the generated cool/mcool file is
transposed, which is RNA x DNA contact matrix.
-i : Input file.
-o : Output file. BEDPE output is gzip compressed file. cool output are .cool and .mcool files.
-h : Show usage <span class="nb">help</span>
-o : Output file.
BEDPE output is gzip compressed file, so it&#39;s better to have a .gz file extension.
cool output are two files, .cool and .mcool. The -o option assigns the name of .cool file, it must use .cool as
extension. The .mcool file will be generated based on the .cool file with .mcool extension.
-h : Show usage help
</pre></div>
</div>
</div>
Expand Down
9 changes: 8 additions & 1 deletion docs/build/html/further_analysis.html
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,14 @@ <h3>HiGlass<a class="headerlink" href="#higlass" title="Permalink to this headli
convert .pairs file to .cool/.mcool file.
Please read the <a class="reference external" href="https://github.com/higlass/higlass/wiki">HiGlass documentation</a> to know how to use it. Besides, there
is an Jupyter Notebook version of HiGlass, <a class="reference external" href="https://github.com/higlass/higlass-jupyter">jupyter-higlass</a>.</p>
<p><img alt="_images/higlass_view.png" src="_images/higlass_view.png" />HiGLass view</p>
<p><strong>Note:</strong> In the iMARGI <code class="docutils literal notranslate"><span class="pre">.pairs</span></code> file, coordinate of RNA is <code class="docutils literal notranslate"><span class="pre">c1:p1</span></code> and coordinate of DNA is <code class="docutils literal notranslate"><span class="pre">c2:p2</span></code>. We can directly
generate <code class="docutils literal notranslate"><span class="pre">.mcool</span></code> file for HiGlass using <code class="docutils literal notranslate"><span class="pre">imargi_convert.sh</span></code> script. When HiGlass rendering the heatmap view from the
<code class="docutils literal notranslate"><span class="pre">.mcool</span></code> file, it uses a X-Y coordinates system, where X is <code class="docutils literal notranslate"><span class="pre">c1:p1</span></code> and Y is <code class="docutils literal notranslate"><span class="pre">c2:p2</span></code>, so it will show a heatmap of
DNA x RNA matrix, i.e., row is DNA and column is RNA (such as the figure below). Currently, if you want to transpose it,
you have to generate a transposed <code class="docutils literal notranslate"><span class="pre">.mcool</span></code> file. Set <code class="docutils literal notranslate"><span class="pre">-T</span> <span class="pre">true</span></code> when you use <code class="docutils literal notranslate"><span class="pre">imargi_convert.sh</span></code> script. The HiGlass team
will add “customizable transpose” function to its control panel in next update version, then you won’t need to care
about this.</p>
<p><img alt="_images/higlass_view.png" src="_images/higlass_view.png" />HiGLass view (row is DNA and column is RNA)</p>
</div>
<div class="section" id="give">
<h3>GIVE<a class="headerlink" href="#give" title="Permalink to this headline"></a></h3>
Expand Down
2 changes: 1 addition & 1 deletion docs/build/html/searchindex.js

Large diffs are not rendered by default.

Loading

0 comments on commit 7338931

Please sign in to comment.