Skip to content

Commit

Permalink
refactoring module docstring
Browse files Browse the repository at this point in the history
  • Loading branch information
ens-ftricomi committed Sep 28, 2023
1 parent 35a97ec commit a60e12d
Show file tree
Hide file tree
Showing 56 changed files with 946 additions and 375 deletions.
2 changes: 1 addition & 1 deletion docs/build/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 88dee1dfb417bc401516c5f62c9b42d9
config: 6ef85c61a07ec8e9f0ed07676e851c59
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file modified docs/build/.doctrees/cpg.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/dust.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/build/.doctrees/eponine.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/genblast.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/index.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/minimap.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/red.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/repeatmasker.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/scallop.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/star.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/stringtie.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/trf.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/trnascan.doctree
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ <h1>Source code for ensembl.tools.anno.protein_annotation.genblast</h1><div clas
<span class="sd">the sequences have undergone significant evolutionary changes.</span>
<span class="sd">This capability makes it a valuable resource for researchers studying gene</span>
<span class="sd">evolution, gene families, and gene function across diverse species.</span>

<span class="sd">GenBlast has been widely used in various genomic analyses and is available as</span>
<span class="sd">a standalone command-line tool or as part of different bioinformatics pipelines.</span>
<span class="sd">Researchers in the field of comparative genomics and gene function analysis</span>
Expand Down Expand Up @@ -110,17 +109,31 @@ <h1>Source code for ensembl.tools.anno.protein_annotation.genblast</h1><div clas
<span class="p">)</span> <span class="o">-&gt;</span> <span class="kc">None</span><span class="p">:</span>
<span class="w"> </span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Executes GenBlast on genomic slices</span>
<span class="sd"> Args:</span>
<span class="sd"> masked_genome : Masked genome file path.</span>
<span class="sd"> output_dir: Working directory path.</span>
<span class="sd"> protein_dataset: Protein dataset (Uniprot/OrthoDb) path.</span>
<span class="sd"> genblast_timeout_secs: Time for timeout (sec).</span>
<span class="sd"> max_intron_length: Maximum intron length.</span>
<span class="sd"> genblast_bin : Software path.</span>
<span class="sd"> convert2blastmask_bin: Software path.</span>
<span class="sd"> makeblastdb_bin : Software path.</span>
<span class="sd"> genblast_timeout: seconds</span>
<span class="sd"> num_threads: int, number of threads.</span>
<span class="sd"> :param masked_genome: Masked genome file path.</span>
<span class="sd"> :type masked_genome: Path</span>
<span class="sd"> :param output_dir: Working directory path.</span>
<span class="sd"> :type output_dir: Path</span>
<span class="sd"> :param protein_dataset: Protein dataset (Uniprot/OrthoDb) path.</span>
<span class="sd"> :type protein_dataset: Path</span>
<span class="sd"> :param genblast_timeout_secs: Time for timeout (sec).</span>
<span class="sd"> :type genblast_timeout_secs: int, default 10800</span>
<span class="sd"> :param max_intron_length: Maximum intron length.</span>
<span class="sd"> :type max_intron_length: int </span>
<span class="sd"> :param genblast_bin: Software path.</span>
<span class="sd"> :type genblast_bin: Path, default genblast</span>
<span class="sd"> :param convert2blastmask_bin: Software path.</span>
<span class="sd"> :type convert2blastmask_bin: Path, default convert2blastmask</span>
<span class="sd"> :param makeblastdb_bin: Software path.</span>
<span class="sd"> :type makeblastdb_bin: Path, default makeblastdb</span>
<span class="sd"> :param genblast_timeout: seconds</span>
<span class="sd"> :type genblast_timeout: int, default 1</span>
<span class="sd"> :param num_threads: int, number of threads.</span>
<span class="sd"> :type num_threads:int, default 1 </span>
<span class="sd"> :param protein_set: Source </span>
<span class="sd"> :type str: [&quot;uniprot&quot;, &quot;orthodb&quot;]</span>
<span class="sd"> </span>
<span class="sd"> :return: None</span>
<span class="sd"> :rtype: None</span>
<span class="sd"> &quot;&quot;&quot;</span>

<span class="n">check_exe</span><span class="p">(</span><span class="n">genblast_bin</span><span class="p">)</span>
Expand Down
35 changes: 16 additions & 19 deletions docs/build/_modules/ensembl/tools/anno/repeat_annotation/dust.html
Original file line number Diff line number Diff line change
Expand Up @@ -95,11 +95,17 @@ <h1>Source code for ensembl.tools.anno.repeat_annotation.dust</h1><div class="hi
<span class="p">)</span> <span class="o">-&gt;</span> <span class="kc">None</span><span class="p">:</span>
<span class="w"> </span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Run Dust on genomic slices with mutiprocessing</span>
<span class="sd"> Args:</span>
<span class="sd"> genome_file : Genome file path.</span>
<span class="sd"> output_dir : Working directory path.</span>
<span class="sd"> dust_bin : Dust software path.</span>
<span class="sd"> num_threads: Number of threads.</span>
<span class="sd"> :param genome_file: Genome file path.</span>
<span class="sd"> :type genome_file: PathLike</span>
<span class="sd"> :param output_dir: Working directory path.</span>
<span class="sd"> :type output_dir: Path</span>
<span class="sd"> :param dust_bin: Dust software path.</span>
<span class="sd"> :type dust_bin: Path, default dustmasker</span>
<span class="sd"> :param num_threads: Number of threads.</span>
<span class="sd"> :type num_threads: int, default 1</span>
<span class="sd"> </span>
<span class="sd"> :return: None</span>
<span class="sd"> :rtype: None</span>
<span class="sd"> &quot;&quot;&quot;</span>

<span class="n">check_exe</span><span class="p">(</span><span class="n">dust_bin</span><span class="p">)</span>
Expand All @@ -113,9 +119,7 @@ <h1>Source code for ensembl.tools.anno.repeat_annotation.dust</h1><div class="hi
<span class="k">return</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">&quot;Creating list of genomic slices&quot;</span><span class="p">)</span>
<span class="n">seq_region_to_length</span> <span class="o">=</span> <span class="n">get_seq_region_length</span><span class="p">(</span><span class="n">genome_file</span><span class="p">,</span> <span class="mi">5000</span><span class="p">)</span>
<span class="n">slice_ids_per_region</span> <span class="o">=</span> <span class="n">get_slice_id</span><span class="p">(</span>
<span class="n">seq_region_to_length</span><span class="p">,</span> <span class="n">slice_size</span><span class="o">=</span><span class="mi">1000000</span><span class="p">,</span> <span class="n">overlap</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">min_length</span><span class="o">=</span><span class="mi">5000</span>
<span class="p">)</span>
<span class="n">slice_ids_per_region</span> <span class="o">=</span> <span class="n">get_slice_id</span><span class="p">(</span><span class="n">seq_region_to_length</span><span class="p">,</span> <span class="n">slice_size</span><span class="o">=</span><span class="mi">1000000</span><span class="p">,</span> <span class="n">overlap</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">min_length</span><span class="o">=</span><span class="mi">5000</span><span class="p">)</span>
<span class="n">dust_cmd</span> <span class="o">=</span> <span class="p">[</span><span class="n">dust_bin</span><span class="p">,</span> <span class="s2">&quot;-in&quot;</span><span class="p">]</span>
<span class="n">pool</span> <span class="o">=</span> <span class="n">multiprocessing</span><span class="o">.</span><span class="n">Pool</span><span class="p">(</span><span class="n">num_threads</span><span class="p">)</span> <span class="c1"># pylint: disable=consider-using-with</span>
<span class="k">for</span> <span class="n">slice_id</span> <span class="ow">in</span> <span class="n">slice_ids_per_region</span><span class="p">:</span>
Expand Down Expand Up @@ -197,8 +201,7 @@ <h1>Source code for ensembl.tools.anno.repeat_annotation.dust</h1><div class="hi
<span class="n">start</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">result_match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">))</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">end</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">result_match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">2</span><span class="p">))</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">gtf_line</span> <span class="o">=</span> <span class="p">(</span>
<span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">region_name</span><span class="si">}</span><span class="se">\t</span><span class="s2">Dust</span><span class="se">\t</span><span class="s2">repeat</span><span class="se">\t</span><span class="si">{</span><span class="n">start</span><span class="si">}</span><span class="se">\t</span><span class="s2">&quot;</span>
<span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="n">end</span><span class="si">}</span><span class="se">\t</span><span class="s1">.</span><span class="se">\t</span><span class="s1">+</span><span class="se">\t</span><span class="s1">.</span><span class="se">\t</span><span class="s1">repeat_id &quot;</span><span class="si">{</span><span class="n">repeat_count</span><span class="si">}</span><span class="s1">&quot;;</span><span class="se">\n</span><span class="s1">&#39;</span>
<span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">region_name</span><span class="si">}</span><span class="se">\t</span><span class="s2">Dust</span><span class="se">\t</span><span class="s2">repeat</span><span class="se">\t</span><span class="si">{</span><span class="n">start</span><span class="si">}</span><span class="se">\t</span><span class="s2">&quot;</span> <span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="n">end</span><span class="si">}</span><span class="se">\t</span><span class="s1">.</span><span class="se">\t</span><span class="s1">+</span><span class="se">\t</span><span class="s1">.</span><span class="se">\t</span><span class="s1">repeat_id &quot;</span><span class="si">{</span><span class="n">repeat_count</span><span class="si">}</span><span class="s1">&quot;;</span><span class="se">\n</span><span class="s1">&#39;</span>
<span class="p">)</span>
<span class="n">dust_out</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">gtf_line</span><span class="p">)</span>
<span class="n">repeat_count</span> <span class="o">+=</span> <span class="mi">1</span>
Expand All @@ -207,20 +210,14 @@ <h1>Source code for ensembl.tools.anno.repeat_annotation.dust</h1><div class="hi
<span class="k">class</span> <span class="nc">InputSchema</span><span class="p">(</span><span class="n">argschema</span><span class="o">.</span><span class="n">ArgSchema</span><span class="p">):</span>
<span class="w"> </span><span class="sd">&quot;&quot;&quot;Input arguments expected to run DustMasker.&quot;&quot;&quot;</span>

<span class="n">genome_file</span> <span class="o">=</span> <span class="n">argschema</span><span class="o">.</span><span class="n">fields</span><span class="o">.</span><span class="n">InputFile</span><span class="p">(</span>
<span class="n">required</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">description</span><span class="o">=</span><span class="s2">&quot;Genome file path&quot;</span>
<span class="p">)</span>
<span class="n">output_dir</span> <span class="o">=</span> <span class="n">argschema</span><span class="o">.</span><span class="n">fields</span><span class="o">.</span><span class="n">OutputDir</span><span class="p">(</span>
<span class="n">required</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">description</span><span class="o">=</span><span class="s2">&quot;Output directory path&quot;</span>
<span class="p">)</span>
<span class="n">genome_file</span> <span class="o">=</span> <span class="n">argschema</span><span class="o">.</span><span class="n">fields</span><span class="o">.</span><span class="n">InputFile</span><span class="p">(</span><span class="n">required</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">description</span><span class="o">=</span><span class="s2">&quot;Genome file path&quot;</span><span class="p">)</span>
<span class="n">output_dir</span> <span class="o">=</span> <span class="n">argschema</span><span class="o">.</span><span class="n">fields</span><span class="o">.</span><span class="n">OutputDir</span><span class="p">(</span><span class="n">required</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">description</span><span class="o">=</span><span class="s2">&quot;Output directory path&quot;</span><span class="p">)</span>
<span class="n">dust_bin</span> <span class="o">=</span> <span class="n">argschema</span><span class="o">.</span><span class="n">fields</span><span class="o">.</span><span class="n">String</span><span class="p">(</span>
<span class="n">required</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="n">default</span><span class="o">=</span><span class="s2">&quot;dustmasker&quot;</span><span class="p">,</span>
<span class="n">description</span><span class="o">=</span><span class="s2">&quot;Dust executable path&quot;</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">num_threads</span> <span class="o">=</span> <span class="n">argschema</span><span class="o">.</span><span class="n">fields</span><span class="o">.</span><span class="n">Integer</span><span class="p">(</span>
<span class="n">required</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">description</span><span class="o">=</span><span class="s2">&quot;Number of threads&quot;</span>
<span class="p">)</span>
<span class="n">num_threads</span> <span class="o">=</span> <span class="n">argschema</span><span class="o">.</span><span class="n">fields</span><span class="o">.</span><span class="n">Integer</span><span class="p">(</span><span class="n">required</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">description</span><span class="o">=</span><span class="s2">&quot;Number of threads&quot;</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">main</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="kc">None</span><span class="p">:</span>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,14 +80,15 @@ <h1>Source code for ensembl.tools.anno.repeat_annotation.red</h1><div class="hig
<span class="k">def</span> <span class="nf">run_red</span><span class="p">(</span><span class="n">genome_file</span><span class="p">:</span> <span class="n">Path</span><span class="p">,</span> <span class="n">output_dir</span><span class="p">:</span> <span class="n">Path</span><span class="p">,</span> <span class="n">red_bin</span><span class="p">:</span> <span class="n">Path</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="s2">&quot;Red&quot;</span><span class="p">),)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
<span class="w"> </span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Run Red on genome file</span>

<span class="sd"> Args:</span>
<span class="sd"> genome_file : Genome file path.</span>
<span class="sd"> output_dir : Working directory path.</span>
<span class="sd"> red_bin : Red software path.</span>

<span class="sd"> Return:</span>
<span class="sd"> masked genome file</span>
<span class="sd"> :param genome_file: Genome file path.</span>
<span class="sd"> :type genome_file: Path</span>
<span class="sd"> :param output_dir: Working directory path.</span>
<span class="sd"> :type output_dir: Path</span>
<span class="sd"> :param red_bin: Red software path.</span>
<span class="sd"> :type red_bin: Path, default Red</span>
<span class="sd"> </span>
<span class="sd"> :return: Masked genome file</span>
<span class="sd"> :rtype: str</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="n">check_exe</span><span class="p">(</span><span class="n">red_bin</span><span class="p">)</span>
<span class="n">red_dir</span> <span class="o">=</span> <span class="n">create_dir</span><span class="p">(</span><span class="n">output_dir</span><span class="p">,</span> <span class="s2">&quot;red_output&quot;</span><span class="p">)</span>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,14 +96,24 @@ <h1>Source code for ensembl.tools.anno.repeat_annotation.repeatmasker</h1><div c

<span class="w"> </span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Executes RepeatMasker on the genome slices and stores the final annotation.gtf in repeatmasker_output</span>
<span class="sd"> Args:</span>
<span class="sd"> genome_file : Genome file path.</span>
<span class="sd"> repeatmasker_path : RepeatMasker executable path.</span>
<span class="sd"> library : Custom repeat library.</span>
<span class="sd"> species :Species name.</span>
<span class="sd"> output_dir : Output directory path.</span>
<span class="sd"> num_threads: Number of threads.</span>

<span class="sd"> :param genome_file: Genome file path.</span>
<span class="sd"> :type genome_file: PathLike</span>
<span class="sd"> :param output_dir: Output directory path.</span>
<span class="sd"> :type output_dir: Path</span>
<span class="sd"> :param repeatmasker_bin: RepeatMasker executable path.</span>
<span class="sd"> :type repeatmasker_bin: Path, default RepeatMasker</span>
<span class="sd"> :param library: Custom repeat library.</span>
<span class="sd"> :type library: str</span>
<span class="sd"> :param repeatmasker_engine: RepeatMasker engine.</span>
<span class="sd"> :type repeatmasker_engine: str, default rmblast</span>
<span class="sd"> :param species: Species name.</span>
<span class="sd"> :type species: str</span>
<span class="sd"> :param num_threads: Number of threads.</span>
<span class="sd"> :type num_threads: int, default 1</span>
<span class="sd"> </span>
<span class="sd"> :return: None</span>
<span class="sd"> :rtype: None</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="n">check_exe</span><span class="p">(</span><span class="n">repeatmasker_bin</span><span class="p">)</span>
<span class="n">repeatmasker_dir</span> <span class="o">=</span> <span class="n">create_dir</span><span class="p">(</span><span class="n">output_dir</span><span class="p">,</span> <span class="s2">&quot;repeatmasker_output&quot;</span><span class="p">)</span>
Expand Down
Loading

0 comments on commit a60e12d

Please sign in to comment.