Merge pull request #4510 from wee-snufkin/alevin-commandline

New tutorial - generating a single cell matrix using Alevin (bash + R)
galaxyproject · Dec 8, 2023 · 5c4f295 · 5c4f295
2 parents dbcf86c + 0bc3c82
commit 5c4f295
Show file tree

Hide file tree

Showing 5 changed files with 873 additions and 0 deletions.
diff --git a/topics/single-cell/images/scrna-pre-processing/bash.png b/topics/single-cell/images/scrna-pre-processing/bash.png
diff --git a/topics/single-cell/images/scrna-pre-processing/switch_kernel.jpg b/topics/single-cell/images/scrna-pre-processing/switch_kernel.jpg
diff --git a/topics/single-cell/tutorials/alevin-commandline/preamble.md b/topics/single-cell/tutorials/alevin-commandline/preamble.md
@@ -0,0 +1,95 @@
+# Introduction 
+
+This tutorial is part of [Single-cell RNA-seq: Case Study]({% link topics/single-cell/index.md %}) series and focuses on generating a single cell matrix using Alevin ({% cite srivastava2019alevin %}) in the bash command line. It is a replication of the [previous tutorial]({% link topics/single-cell/tutorials/scrna-case_alevin/tutorial.md %}) and will guide you through the same steps that you followed in the previous tutorial and will give you more understanding of what is happening ‘behind the scenes’ or ‘inside the tools’ if you will.
+As a recap, we will go from raw FASTQ files to a cell x gene data matrix in AnnData format. After completing the previous tutorial you should already know what is a data matrix and AnnData format. We will perform the following steps:
+1.	Getting the appropriate files
+2.	Making a transcript-to-gene ID mapping
+3.	Creating Salmon index
+4.	Quantification of transcript expression using Alevin
+5.	Creating Summarized Experiment from the Alevin output
+6.	Adding metadata
+7.	Combining samples data
+
+> <warning-title>This tutorial is for teaching purposes</warning-title>
+> We created this tutorial as a gateway to coding to demonstrate what happens behind the Galaxy buttons in the [corresponding tutorial]({% link topics/single-cell/tutorials/scrna-case_alevin/tutorial.md %}). This is why we are using massively subsampled data - it's only for demonstration purposes. If you want to perform this tutorial fully on your own data, you will need another compute power because it's simply not going to scale here. You can always use the Galaxy buttons' Alevin version which has large memory and few cores dedicated.
+{: .warning}
+
+
+## Launching JupyterLab
+
+> <warning-title>Data uploads & JupyterLab</warning-title>
+> There are a few ways of importing and uploading data into JupyterLab. You might find yourself accidentally doing this differently than the tutorial, and that's ok. There are a few key steps where you will call files from a location - if these don't work for you, check that the file location is correct and change accordingly!
+{: .warning}
+
+> {% snippet faqs/galaxy/interactive_tools_jupyter_launch.md %}
+
+Welcome to JupyterLab!
+
+> <warning-title>Danger: You can lose data!</warning-title>
+> Do NOT delete or close this notebook dataset in your history. YOU WILL LOSE IT!
+{: .warning}
+
+## Open the notebook
+
+You have two options for how to proceed with this JupyterLab tutorial - you can run the tutorial from a pre-populated notebook, or you can copy and paste the code for each step into a fresh notebook and run it. The initial instructions for both options are below.
+
+> <hands-on-title>Option 1: Open the notebook directly in JupyterLab</hands-on-title>
+>
+> 1. Open a `Terminal` in JupyterLab with File -> New -> Terminal
+>
+>   ![Screenshot of the Launcher tab with an arrow indicating where to find Terminal.](../../images/scrna-casestudy-monocle/terminal_choose.jpg "This is how the Launcher tab looks like and where you can find Terminal.")
+>
+> 2. Run
+>    ```
+>    wget {{ ipynbpath }}
+>    ```
+>
+> 3. Select the notebook that appears in the list of files on the left.
+>
+>
+> Remember that you can also download this {% icon notebook %} [Jupyter Notebook]({{ ipynbpath }}) from the {% icon galaxy_instance %} Supporting Materials in the Overview box at the beginning of this tutorial.
+{: .hands_on}
+
+> <hands-on-title>Option 2: Creating a notebook</hands-on-title>
+>
+> 1. If you are in the Launcher window, Select the **Bash** icon under **Notebook** (to open a new Launcher go to File -> New Launcher).
+>
+>   ![Bash icon](../../images/scrna-pre-processing/bash.png "Bash Notebook Button")
+>
+> 2. Save your file (**File**: **Save**, or click the {% icon galaxy-save %} Save icon at the top left)
+>
+> 3. If you right click on the file in the folder window at the left, you can rename your file `whateveryoulike.ipynb`
+>
+{: .hands_on}
+
+> <warning-title>You should <b>Save</b> frequently!</warning-title>
+> This is both for good practice and to protect you in case you accidentally close the browser. Your environment will still run, so it will contain the last saved notebook you have. You might eventually stop your environment after this tutorial, but ONLY once you have saved and exported your notebook (more on that at the end!) Note that you can have multiple notebooks going at the same time within this JupyterLab, so if you do, you will need to save and export each individual notebook. You can also download them at any time.
+{: .warning}
+
+Let's crack on!
+
+{% snippet topics/single-cell/faqs/notebook_warning.md %}
+
+
+## Installation
+
+Before we start working on the tutorial notebook, we need to install required packages.
+
+><hands-on-title>Installing the packages</hands-on-title>
+>
+> 1. Navigate back to the `Terminal` (if you haven't opened it yet, just go to File -> New -> Terminal)
+> 2. In the Terminal tab open, write the following, one line at a time:
+> ```
+>conda install -y -c bioconda bioconductor-tximeta                     # install this first to avoid problem with re-installation of rtracklayer
+>```
+>```
+>conda install -y -c bioconda atlas-gene-annotation-manipulation     
+>```
+>```
+>conda install -y -c bioconda bioconductor-dropletutils
+>```
+>
+{: .hands_on}
+
+
+Installation will take a long while, so in the meantime, when it's running, you can open the notebook and follow the rest of this tutorial there!
diff --git a/topics/single-cell/tutorials/alevin-commandline/tutorial.bib b/topics/single-cell/tutorials/alevin-commandline/tutorial.bib
@@ -0,0 +1,66 @@
+@article{Bacon2018,
+  doi = {10.3389/fimmu.2018.02523},
+  url = {https://doi.org/10.3389/fimmu.2018.02523},
+  year = {2018},
+  month = nov,
+  publisher = {Frontiers Media {SA}},
+  volume = {9},
+  author = {Wendi A. Bacon and Russell S. Hamilton and Ziyi Yu and Jens Kieckbusch and Delia Hawkes and Ada M. Krzak and Chris Abell and Francesco Colucci and D. Stephen Charnock-Jones},
+  title = {Single-Cell Analysis Identifies Thymic Maturation Delay in Growth-Restricted Neonatal Mice},
+  journal = {Frontiers in Immunology}
+}
+
+@article{Lun2019,
+  doi = {10.1186/s13059-019-1662-y},
+  url = {https://doi.org/10.1186/s13059-019-1662-y},
+  year = {2019},
+  month = mar,
+  publisher = {Springer Science and Business Media {LLC}},
+  volume = {20},
+  number = {1},
+  author = {Aaron T. L. Lun and   and Samantha Riesenfeld and Tallulah Andrews and The Phuong Dao and Tomas Gomes and John C. Marioni},
+  title = {{EmptyDrops}: distinguishing cells from empty droplets in droplet-based single-cell {RNA} sequencing data},
+  journal = {Genome Biology}
+}
+
+@article{Love2020,
+  doi = {10.1371/journal.pcbi.1007664},
+  url = {https://doi.org/10.1371/journal.pcbi.1007664},
+  year = {2020},
+  month = feb,
+  publisher = {Public Library of Science ({PLoS})},
+  volume = {16},
+  number = {2},
+  pages = {e1007664},
+  author = {Michael I. Love and Charlotte Soneson and Peter F. Hickey and Lisa K. Johnson and N. Tessa Pierce and Lori Shepherd and Martin Morgan and Rob Patro},
+  editor = {Mihaela Pertea},
+  title = {Tximeta: Reference sequence checksums for provenance identification in {RNA}-seq},
+  journal = {{PLOS} Computational Biology}
+}
+
+@article{srivastava2019alevin,
+  doi = {10.1186/s13059-019-1670-y},
+  url = {https://doi.org/10.1186/s13059-019-1670-y},
+  year = {2019},
+  month = mar,
+  publisher = {Springer Science and Business Media {LLC}},
+  volume = {20},
+  number = {1},
+  author = {Avi Srivastava and Laraib Malik and Tom Smith and Ian Sudbery and Rob Patro},
+  title = {Alevin efficiently estimates accurate gene abundances from {dscRNA}-seq data},
+  journal = {Genome Biology}
+}
+
+@article{Benjamini1995,
+  doi = {10.1111/j.2517-6161.1995.tb02031.x},
+  url = {https://doi.org/10.1111/j.2517-6161.1995.tb02031.x},
+  year = {1995},
+  month = jan,
+  publisher = {Wiley},
+  volume = {57},
+  number = {1},
+  pages = {289--300},
+  author = {Yoav Benjamini and Yosef Hochberg},
+  title = {Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing},
+  journal = {Journal of the Royal Statistical Society: Series B (Methodological)}
+}