Skip to content

Commit

Permalink
remove data import section and replace with a short import spiel
Browse files Browse the repository at this point in the history
  • Loading branch information
wee-snufkin authored Dec 15, 2023
1 parent 476cce4 commit 2f84dca
Showing 1 changed file with 1 addition and 111 deletions.
112 changes: 1 addition & 111 deletions topics/single-cell/tutorials/scrna-data-ingest/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,117 +79,7 @@ include images showing the structure of those files

# Data import
As you can see above, there are multiple ways to store single cell data. Therefore, there are also many ways how you can get that data!

## EBI SCXA Data Retrieval

If you want to use publicly available data, then EBI's [Single Cell Expression Atlas](https://www.ebi.ac.uk/gxa/sc/home) is a great place to get resources from. You can search datasets according to various criteria either using search box in **Home** tab or choosing kingdom, experiment collection, technology type (and others) in **Browse experiments** tab. When you find the experiment you are interested in, just click on it and the experiment ID will be displayed in the website URL, as shown below.

![Arrow pointing to the website URL where you can find experiment ID.](../../images/scrna-data/exp_id.jpg "Where to find experiment ID on the EBI Single Cell Expression Atlas website.")

Once you know the experiment ID, you can use EBI SCXA Data Retrieval tool in Galaxy!

> <hands-on-title>Retrieving data from Single Cell Expression Atlas</hands-on-title>
>
> 1. {% tool [EBI SCXA Data Retrieval](toolshed.g2.bx.psu.edu/repos/ebi-gxa/retrieve_scxa/retrieve_scxa/v0.0.2+galaxy2) %} with the following parameters:
> - *"SC-Atlas experiment accession"*: `E-MTAB-6945`
> - *"Choose the type of matrix to download"*: `Raw filtered counts`
>
{: .hands_on}

At this point you might want to do some modifications in the files before downstream analysis. That can include re-formating the cell metadata or changing the names of the column headers, it all depends on your dataset and how you want to perfrom your analysis. It's also fine to transform those files straight away. Now you have the choice to create AnnData object or Seurat object.

{% include _includes/cyoa-choices.html option1="Scanpy" option2="Seurat" default="Scanpy"
text="You can choose whether you want to create an AnnData object for Scanpy Analysis or an RDS object for Seurat Analysis. Galaxy has more resources for Scanpy analysis, but sometimes Seurat might have what you want." %}

<div class="AnnData object" markdown="1">

> <hands-on-title>Create AnnData object</hands-on-title>
>
> {% tool [Scanpy Read10x](toolshed.g2.bx.psu.edu/repos/ebi-gxa/scanpy_read_10x/scanpy_read_10x/1.8.1+galaxy0) %} with the following parameters:
> - *"Expression matrix in sparse matrix format (.mtx)"*: `EBI SCXA Data Retrieval on E-MTAB-6945 matrix.mtx (Raw filtered counts)`
> - *"Gene table"*: `EBI SCXA Data Retrieval on E-MTAB-6945 genes.tsv (Raw filtered counts)`
> - *"Barcode/cell table"*: `EBI SCXA Data Retrieval on E-MTAB-6945 barcodes.tsv (Raw filtered counts)`
> - *"Cell metadata table"*: `EBI SCXA Data Retrieval on E-MTAB-6945 exp_design.tsv`
{: .hands_on}

</div>

<div class="Seurat object" markdown="1">

> <hands-on-title>Create Seurat object / Loom / SCE </hands-on-title>
>
> {% tool [Seurat Read10x](toolshed.g2.bx.psu.edu/repos/ebi-gxa/seurat_read10x/seurat_read10x/3.2.3+galaxy0) %} with the following parameters:
> - *"Choose the format of the input"*: `10X-type MTX`
> - *"Expression matrix in sparse matrix format (.mtx)"*: `EBI SCXA Data Retrieval on E-MTAB-6945 matrix.mtx (Raw filtered counts)`
> - *"Gene table"*: `EBI SCXA Data Retrieval on E-MTAB-6945 genes.tsv (Raw filtered counts)`
> - *"Barcode/cell table"*: `EBI SCXA Data Retrieval on E-MTAB-6945 barcodes.tsv (Raw filtered counts)`
> - *"Cell Metadata"*: `EBI SCXA Data Retrieval on E-MTAB-6945 exp_design.tsv`
>
> You can now choose if you want to get Seurat object, Loom or Single Cell Experiment by selecting your option in *"Choose the format of the output"*.
{: .hands_on}

</div>

<!---
HCA doesn't work well for other datasets...
https://github.com/galaxyproject/training-material/issues/4567
-->

## Human Cell Atlas Matrix Downloader

This tool allows to retrieve expression matrices and metadata for any public experiment available at [Human Cell Atlas data portal](https://data.humancellatlas.org/).

To use it, simply set the project title, project label or project UUID, which can be found at the [HCA data browser](https://data.humancellatlas.org/explore/projects), and select the desired matrix format (Matrix Market or Loom).

![Image showing project UUID as a final fragment of link address, project title (self-explanatory) and project label as an entry in the box on the right side of the page.](../../images/scrna-data/HCA.jpg "Where to find project title, project label and project UUID")

For projects that have more than one organism, one needs to be specified. Otherwise, there is no need to set the species.

Let's use the suggested example of the project *Single cell transcriptome analysis of human pancreas*. If you check this project in HCA, you'll find out that it's actually its label. But it will work the same if you enter the title or UUID!

> <hands-on-title>Create AnnData object</hands-on-title>
>
> {% tool [Human Cell Atlas Matrix Downloader](toolshed.g2.bx.psu.edu/repos/ebi-gxa/hca_matrix_downloader/hca_matrix_downloader/v0.0.4+galaxy0) %} with the following parameters:
> - *"Human Cell Atlas project name/label/UUID"*: `Single cell transcriptome analysis of human pancreas`
> - *"Choose the format of matrix to download"*: `Matrix Market`
{: .hands_on}

> <details-title>What will be the output?</details-title>
>
> When "Matrix Market" is seleted, outputs are in 10X-compatible Matrix Market format:
> - **Matrix (txt)**: Contains the expression values for genes (rows) and cells (columns) in raw counts. This text file is formatted as a Matrix Market file, and as such it is accompanied by separate files for the gene identifiers and the cells identifiers.
> - **Genes (tsv)**: Identifiers (column repeated) for the genes present in the matrix of expression, in the same order as the matrix rows.
> - **Barcodes (tsv)**: Identifiers for the cells of the data matrix. The file is ordered to match the columns of the matrix.
> - **Experiment Design file (tsv)**: Contains metadata for the different cells of the experiment.
>
> When "Loom" is selected, output is a single Loom HDF5 file:
> - **Loom (h5)**: Contains expression values for genes (rows) and cells (columns) in raw counts, cell metadata table and gene metadata table, in a [single HDF5 file](http://linnarssonlab.org/loompy/format/index.html).
>
{: .details}

If you chose **Loom** format and you need to convert your file to other datatype, you can use {% tool [SCEasy](toolshed.g2.bx.psu.edu/repos/iuc/sceasy_convert/sceasy_convert/0.0.7+galaxy1) %} (more details in the next section). If you chose **Matrix Market** format, you can then transform the output to AnnData or Seurat, as shown in the EBI SCXA example above. Below, you will find an example of transforming the output to AnnData object.


> <hands-on-title>Create AnnData object</hands-on-title>
>
> {% tool [Scanpy Read10x](toolshed.g2.bx.psu.edu/repos/ebi-gxa/scanpy_read_10x/scanpy_read_10x/1.8.1+galaxy0) %} with the following parameters:
> - *"Expression matrix in sparse matrix format (.mtx)"*: `Human Cell Atlas Matrix Downloader on matrix.mtx`
> - *"Gene table"*: `Human Cell Atlas Matrix Downloader on genes.tsv`
> - *"Barcode/cell table"*: `Human Cell Atlas Matrix Downloader on barcodes.tsv`
> - *"Cell metadata table"*: `Human Cell Atlas Matrix Downloader on exp_design.tsv`
{: .hands_on}


> <tip-title>Flagging genes by using AnnData Operations</tip-title>
>
> After you create AnnData file, you can additionally use the {% tool [AnnData Operations](toolshed.g2.bx.psu.edu/repos/ebi-gxa/anndata_ops/anndata_ops/1.8.1+galaxy92) %} tool before downstream analysis. It's quite a useful tool since not only does it flag mitochondrial genes, but also automatically calculates a bunch of metrics, such as `log1p_mean_counts`, `log1p_total_counts`, `mean_counts`, `n_cells`, `n_cells_by_counts`, `n_counts`, `pct_dropout_by_counts`, and `total_counts`.
>
> When you use it to flag mitochondrial genes, here are some formatting tips:
> - Remember to check the name of the column with gene symbols
> - This tool is case sensitive
> - No parentheses needed when typing in the values
> - Including a dash is important to identify mitochondrial genes (eg. **MT-**)
{: .tip}
Obviously, before any format conversion, we need to import the data. In our tutorials we often use [Zenodo](https://zenodo.org/) links, but you can also upload the files directly from your computer. There are also publicly available resources which you can easily access through public atlases, such as [Single Cell Expression Atlas](https://www.ebi.ac.uk/gxa/sc/home) or [Human Cell Atlas data portal](https://data.humancellatlas.org/). We created a [dedicated tutorial]({% link topics/single-cell/tutorials/EBI-retrieval/tutorial.md %}) to show how to use those atlases to retrieve data. But today we're here to focus on data conversion!


# SCEasy Tool
Expand Down

0 comments on commit 2f84dca

Please sign in to comment.