Merge pull request #4796 from abueg/abueg11mar2024

vgp assembly: replaced est. genome size calculation with upload via paste data
galaxyproject · Mar 13, 2024 · e6047df · e6047df
2 parents aca5425 + d832d52
commit e6047df
Show file tree

Hide file tree

Showing 2 changed files with 14 additions and 21 deletions.
diff --git a/topics/assembly/images/vgp_assembly/paste_data_to_upload.png b/topics/assembly/images/vgp_assembly/paste_data_to_upload.png
diff --git a/topics/assembly/tutorials/vgp_genome_assembly/tutorial.md b/topics/assembly/tutorials/vgp_genome_assembly/tutorial.md
@@ -913,19 +913,13 @@ Now that we have looked at our primary assembly with multiple {QC} metrics, we k
 
 Before proceeding to purging, we need to carry out some text manipulation operations on the output generated by GenomeScope2 to make it compatible with downstream tools. The goal is to extract some parameters which at a later stage will be used by **purge_dups**.
 
-### Parsing **purge_dups** cutoffs from **GenomeScope2** output
+### Getting **purge_dups** cutoffs from **GenomeScope2** output
 
 The first relevant parameter is the `estimated genome size`.
 
 > <hands-on-title>Get estimated genome size</hands-on-title>
 >
->**Step 1**: Open {% tool [Replace parts of text](toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.4) %}
-><br>
->**Step 2**: Scroll down to find *"+ Insert Find and Replace"* button and click it.
-><br>
->**Step 3**: Scroll down again to find *"+ Insert Find and Replace"* button and click it again. After this you should have *"Find and Replace"* panel repeated three times: *"1: Find and Replace"*, *"2: Find and Replace"*, and *"3: Find and Replace"*. 
-><br>
->**Step 4**: In {% icon param-file %} *"File to process"*: Select `GenomeScope summary` output (generated during *k*-mer profiling [step](#genome-profiling-with-genomescope2)). The input file should have content that looks like this (it may not be exactly like this):
+>**Step 1**: Look at the `GenomeScope summary` output (generated during *k*-mer profiling [step](#genome-profiling-with-genomescope2)). The file should have content that looks like this (it may not be exactly like this):
 > ```
 > GenomeScope version 2.0
 > input file = ....
@@ -943,23 +937,22 @@ The first relevant parameter is the `estimated genome size`.
 > Model Fit                     92.5159%          96.5191%          
 > Read Error Rate               0.000943206%      0.000943206%   
 >``` 
+><br>
+>**Step 2**: Copy the number value for the maximum Genome Haploid Length to your clipboard (CTRL + C on Windows; CMD + C on MacOS).
 >
->**Step 5**: In the first Find and Replace panel *"1: Find and Replace"* set the following parameters:  
-> 1. *"Find pattern"*: `^(?!Genome Haploid Length).*\n`
-> 2. *"Find-Pattern is a regular expression"*: Toggle to `Yes`
+>**Step 3**: Click on "Upload Data" in the toolbox on the left.
 >
-><br>
->**Step 6**: In the second Find and Replace panel *"2: Find and Replace"* set the following parameters:  
-> 1. *"Find pattern"*: `Genome Haploid Length\s+(\d{1,3}(?:,\d{3})*\s+bp)\s+(\d{1,3}(?:,\d{3})*)\s+bp`
-> 2. *"Replace with"*: `$2`
-> 3. *"Find-Pattern is a regular expression"*: Toggle to `Yes`
+>**Step 4**: Click on "Paste/Fetch data".
 >
-><br>
->**Step 7**: In the third Find and Replace panel *"3: Find and Replace"* set the following parameters:  
->*"Find pattern"*: `,` (Yes, just a comma)
+>**Step 5**: Change `New File` to `Estimated genome size`.
 >
-><br>
->**Step 8**: Rename the output as `Estimated genome size`.
+>**Step 6**: Paste the maximum Genome Haploid Length into the text box.
+>
+>**Step 7**: Remove the commas from the number! We only want integers.
+>
+>**Step 8**: Click "Start".
+>
+> ![Image showing where to click to upload data as pasted data.](../../images/vgp_assembly/paste_data_to_upload.png "Use the 'paste data' dialog to upload a file with the estimated genome size.")
 >
 > > <question-title></question-title>
 > >