Merge pull request #160 from ARTbio/metavisitor_doc

Metavisitor doc
ARTbio · May 16, 2016 · 26b697b · 26b697b
2 parents e3b8983 + 4fa45fb
commit 26b697b
Show file tree

Hide file tree

Showing 4 changed files with 35 additions and 10 deletions.
diff --git a/docs/metavisitor_ansible.md b/docs/metavisitor_ansible.md
@@ -62,7 +62,7 @@ In this specific case, add in the hosts inventory file:
 
 ```
 [metavisitor]
-192.54.201.126 ansible_ssh_user="root" ansible_ssh_private_key_file="~/.ssh/aws_private_key.pem"
+192.54.201.126 ansible_ssh_user="ubuntu" ansible_ssh_private_key_file="~/.ssh/aws_private_key.pem"
 
 [aws]
 192.54.201.126

diff --git a/docs/metavisitor_configure_references.md b/docs/metavisitor_configure_references.md
@@ -1,6 +1,6 @@
 Once you know how to access to your Metavisitor Galaxy instance with a web browser and are able to perform basic start/stop/restart operations, there is still some work needed to import and configure reference data (genomes) so that they are directly available to all instance users for running tools and workflows
 
-Here we provide the step-by-step description of what we *actually did ourselves* to prepare our Metavisitor instance before performing the analyses described here [here](http://dx.doi.org/10.1101/048983).
+Here we provide the step-by-step description of what we *actually did ourselves* to prepare our Metavisitor instance before performing the analyses described [here](http://dx.doi.org/10.1101/048983).
 
 ## 1. Connect to your Metavisitor Galaxy admin account with your web browser
 

diff --git a/docs/use_case_2.md b/docs/use_case_2.md
@@ -57,7 +57,32 @@ You may follow the link to the new history when the workflow is started.
 3. Be careful at selecting `long read RNAseq datasets` for the step 1 (Input Dataset Collection)
 4. For the step 2, the option `protein vir1 blast database` is forced, because the workflow is expecting of protein blast database for this step and only one dataset with this datatype is available in the history
 5. Click the `Send results to a new history` checkbox and rename the history to "History for Use Case 2-1".
-6. Run Workflow !
+6. Run Workflow.
+
+## Re-mapping of the small RNA reads (ERP012577) to the AnCV genome (KU169878).
+The previous workflow allowed to assemble a large contig of 8919 nt which significantly matched structural and non-structural polyproteins of Drosophila C Virus and Cricket Paralysis Virus in blastx alignments (see the dataset `blast analysis, by subjects` of the history). This large contig corresponds to the genome of a new Anopheles C Virus deposited to the NCBI nucleotide database under accession number KU169878 (see the [companion Metavisitor article](http://dx.doi.org/10.1101/048983) and [Carissimo et al](http://dx.doi.org/10.1371/journal.pone.0153881)).
+
+Here, we are going to perform manually a few steps, before using another workflow in the history 2-2 to remap the ERP012577 small RNA reads to the AnCV genome.
+
+1. Look at the `blast analysis, by subjects` dataset and copy the name of the 8919 nt contig that aligned to DCV and CrPV sequences. It is noteworthy that this name may vary from one Oase run to another because the Oases algorithm is not totally deterministic. In the [companion Metavisitor article](http://dx.doi.org/10.1101/048983), this name was Locus_69_Transcript_1/1_Confidence_0.000_Length_8919.
+    - Copy this name, find the tool `Pick Fasta sequences with header satisfying a query string` in the Galaxy tool bar, and paste this name in the field `Select sequences with this string in their header` of the tool form. Select the dataset `Oases_optimiser on data 20: Denovo assembled transcripts` as a source file, and run the tool.
+2. Now, we are going to change the header of the previously extracted fasta sequence using the tool `Regex Find And Replace`.
+    - Select the previous dataset `Pick Fasta sequences on data 21 including 'Locus_69_Transcript_1/1_Confidence_0.000_Length_8919' in header` as input dataset for this tool. Click on `+ Insert Check`. Use `Locus_69_Transcript_1/1_Confidence_0.000_Length_8919` as *Find Regex* and `Anopheles_C_Virus|KU169878` as *Replacement*. Execute the tool. Look at the resulting dataset.
+
+3. Copy the dataset collection `Small RNA reads ERP012577` from the history `Input data for Use Cases 2-1 and 2-2` into the *current* history `Use Case 2-2`. You may have the refresh the history bar to see this collection and the attached datasets popping up.
+
+We are now ready to run the workflow.
+
+----
+
+1. In the workflow menu, pick up the workflow `Metavisitor: Workflow for remapping in Use Cases 2-1,2` and select the `run` option.
+2. In the workflow form, ensure that `Small RNA reads ERP012577` are selected for the Step 1 and `Regex Find And Replace on data 28` is selected for the step 2 (this should be the case if you followed the instructions).
+3. This time, *do not* check the box `Send results to a new history` and directly click the `Run workflow`button.
+
+This workflow will provide you with a graphical view of ERP012577 small RNA mapping to the AnCV genome.
+
+
+
 
 
 

diff --git a/docs/use_case_3-3.md b/docs/use_case_3-3.md
@@ -87,7 +87,7 @@ When you have finishid with the 55 datasets, make sure to change their datatype
 1. Stay in the history `Input data for Use Case 3-3`
 - pick the workflow `Metavisitor: Workflow for Use Case 3-3` in the workflows menu. It is possible that the workflow manager complains about settings for the Trinity tool that is used in this workflow. This is a minor issue if happens: just edit the workflow, click on the tool `Trinity` and specify the number of processors accordingly to your computing infrastructure. Save the workflow and select the `run` option.
 - Before Step 1, you have to specify some parameters at run time. For Ebola virus, the field `target_virus` has to be filled with `Ebola` and the field `reference_virus` has to be filled with `NC_002549.1` (as a guide for reconstruction of the Ebola virus genome).
-- For Step 1, select `Ebola virus` (this should be already selected).
+- For Step 1, select `Ebola virus`.
 - For Step 2, select the `nucleotide vir1 blast database` (this should also be already selected)
 - As usual, check the box `Send results to a new history`, edit the name of the new history to `Use Case 3-3 Ebola virus`, and `Execute` the workflow ! Note, that for complex workflows with dataset collections in input, the actual warning that the workflow is started make take time to show up; you can even have a "504 Gateway Time-out" warning. This is not a serious issue: just go in your `User` -> `Saved history` menu, you will see your `Use Case 3-3 Ebola virus` history running and you will be able to access it.
 
@@ -96,19 +96,19 @@ The workflow for Use Case 3-3 may take a long time. Be patient.
 ## History for Use Case 3-3 / Lassa virus, segment L
 1. Stay in the history `Input data for Use Case 3-3`
 - pick the workflow `Metavisitor: Workflow for Use Case 3-3` in the workflows menu. It is possible that the workflow manager complains about settings for the Trinity tool that is used in this workflow. This is a minor issue if happens: just edit the workflow, click on the tool `Trinity` and specify the number of processors accordingly to your computing infrastructure. Save the workflow and select the `run` option.
-- Before Step 1, you have to specify some parameters at run time. For Ebola virus, the field `target_virus` has to be filled with `Lassa` and the field `reference_virus` has to be filled with `NC_002549.1` (as a guide for reconstruction of the Ebola virus genome).
-- For Step 1, select `Ebola virus` (this should be already selected).
+- Before Step 1, you have to specify some parameters at run time. For Lassa virus, the field `target_virus` has to be filled with `Lassa` and the field `reference_virus` has to be filled with `NC_004297.1` (as a guide for reconstruction of the segment L of the Lassa virus genome).
+- For Step 1, select `Lassa virus`.
 - For Step 2, select the `nucleotide vir1 blast database` (this should also be already selected)
-- As usual, check the box `Send results to a new history`, edit the name of the new history to `Use Case 3-3 Ebola virus`, and `Execute` the workflow ! Note, that for complex workflows with dataset collections in input, the actual warning that the workflow is started make take time to show up; you can even have a "504 Gateway Time-out" warning. This is not a serious issue: just go in your `User` -> `Saved history` menu, you will see your `Use Case 3-3 Ebola virus` history running and you will be able to access it.
+- As usual, check the box `Send results to a new history`, edit the name of the new history to `Use Case 3-3 Lassa virus segment L`, and `Execute` the workflow ! Note, that for complex workflows with dataset collections in input, the actual warning that the workflow is started make take time to show up; you can even have a "504 Gateway Time-out" warning. This is not a serious issue: just go in your `User` -> `Saved history` menu, you will see your `Use Case 3-3 Lassa virus segment L` history running and you will be able to access it.
 
 The workflow for Use Case 3-3 may take a long time. Be patient.
 
 ## History for Use Case 3-3 / Lassa virus, segment S
 1. Stay in the history `Input data for Use Case 3-3`
 - pick the workflow `Metavisitor: Workflow for Use Case 3-3` in the workflows menu. It is possible that the workflow manager complains about settings for the Trinity tool that is used in this workflow. This is a minor issue if happens: just edit the workflow, click on the tool `Trinity` and specify the number of processors accordingly to your computing infrastructure. Save the workflow and select the `run` option.
-- Before Step 1, you have to specify some parameters at run time. For Ebola virus, the field `target_virus` has to be filled with `Lassa` and the field `reference_virus` has to be filled with `NC_004296.1` (as a guide for reconstruction of the Ebola virus genome).
-- For Step 1, select `Ebola virus` (this should be already selected).
+- Before Step 1, you have to specify some parameters at run time. For Lassa virus, the field `target_virus` has to be filled with `Lassa` and the field `reference_virus` has to be filled with `NC_004296.1` (as a guide for reconstruction of the segment S of the Lassa virus genome).
+- For Step 1, select `Lassa virus` (this should be already selected).
 - For Step 2, select the `nucleotide vir1 blast database` (this should also be already selected)
-- As usual, check the box `Send results to a new history`, edit the name of the new history to `Use Case 3-3 Ebola virus`, and `Execute` the workflow ! Note, that for complex workflows with dataset collections in input, the actual warning that the workflow is started make take time to show up; you can even have a "504 Gateway Time-out" warning. This is not a serious issue: just go in your `User` -> `Saved history` menu, you will see your `Use Case 3-3 Ebola virus` history running and you will be able to access it.
+- As usual, check the box `Send results to a new history`, edit the name of the new history to `Use Case 3-3 Lassa virus segment S`, and `Execute` the workflow ! Note, that for complex workflows with dataset collections in input, the actual warning that the workflow is started make take time to show up; you can even have a "504 Gateway Time-out" warning. This is not a serious issue: just go in your `User` -> `Saved history` menu, you will see your `Use Case 3-3 Lassa virus segment S` history running and you will be able to access it.
 
 The workflow for Use Case 3-3 may take a long time. Be patient.