Skip to content

Commit

Permalink
Merge pull request #4796 from abueg/abueg11mar2024
Browse files Browse the repository at this point in the history
vgp assembly: replaced est. genome size calculation with upload via paste data
  • Loading branch information
bgruening authored Mar 13, 2024
2 parents aca5425 + d832d52 commit e6047df
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 21 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 14 additions & 21 deletions topics/assembly/tutorials/vgp_genome_assembly/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -913,19 +913,13 @@ Now that we have looked at our primary assembly with multiple {QC} metrics, we k
Before proceeding to purging, we need to carry out some text manipulation operations on the output generated by GenomeScope2 to make it compatible with downstream tools. The goal is to extract some parameters which at a later stage will be used by **purge_dups**.
### Parsing **purge_dups** cutoffs from **GenomeScope2** output
### Getting **purge_dups** cutoffs from **GenomeScope2** output
The first relevant parameter is the `estimated genome size`.
> <hands-on-title>Get estimated genome size</hands-on-title>
>
>**Step 1**: Open {% tool [Replace parts of text](toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.4) %}
><br>
>**Step 2**: Scroll down to find *"+ Insert Find and Replace"* button and click it.
><br>
>**Step 3**: Scroll down again to find *"+ Insert Find and Replace"* button and click it again. After this you should have *"Find and Replace"* panel repeated three times: *"1: Find and Replace"*, *"2: Find and Replace"*, and *"3: Find and Replace"*.
><br>
>**Step 4**: In {% icon param-file %} *"File to process"*: Select `GenomeScope summary` output (generated during *k*-mer profiling [step](#genome-profiling-with-genomescope2)). The input file should have content that looks like this (it may not be exactly like this):
>**Step 1**: Look at the `GenomeScope summary` output (generated during *k*-mer profiling [step](#genome-profiling-with-genomescope2)). The file should have content that looks like this (it may not be exactly like this):
> ```
> GenomeScope version 2.0
> input file = ....
Expand All @@ -943,23 +937,22 @@ The first relevant parameter is the `estimated genome size`.
> Model Fit 92.5159% 96.5191%
> Read Error Rate 0.000943206% 0.000943206%
>```
><br>
>**Step 2**: Copy the number value for the maximum Genome Haploid Length to your clipboard (CTRL + C on Windows; CMD + C on MacOS).
>
>**Step 5**: In the first Find and Replace panel *"1: Find and Replace"* set the following parameters:
> 1. *"Find pattern"*: `^(?!Genome Haploid Length).*\n`
> 2. *"Find-Pattern is a regular expression"*: Toggle to `Yes`
>**Step 3**: Click on "Upload Data" in the toolbox on the left.
>
><br>
>**Step 6**: In the second Find and Replace panel *"2: Find and Replace"* set the following parameters:
> 1. *"Find pattern"*: `Genome Haploid Length\s+(\d{1,3}(?:,\d{3})*\s+bp)\s+(\d{1,3}(?:,\d{3})*)\s+bp`
> 2. *"Replace with"*: `$2`
> 3. *"Find-Pattern is a regular expression"*: Toggle to `Yes`
>**Step 4**: Click on "Paste/Fetch data".
>
><br>
>**Step 7**: In the third Find and Replace panel *"3: Find and Replace"* set the following parameters:
>*"Find pattern"*: `,` (Yes, just a comma)
>**Step 5**: Change `New File` to `Estimated genome size`.
>
><br>
>**Step 8**: Rename the output as `Estimated genome size`.
>**Step 6**: Paste the maximum Genome Haploid Length into the text box.
>
>**Step 7**: Remove the commas from the number! We only want integers.
>
>**Step 8**: Click "Start".
>
> ![Image showing where to click to upload data as pasted data.](../../images/vgp_assembly/paste_data_to_upload.png "Use the 'paste data' dialog to upload a file with the estimated genome size.")
>
> > <question-title></question-title>
> >
Expand Down

0 comments on commit e6047df

Please sign in to comment.