Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New tutorial - single cell data import and format conversion #4590

Merged
merged 128 commits into from
Feb 13, 2024

Conversation

wee-snufkin
Copy link
Collaborator

@wee-snufkin wee-snufkin commented Dec 14, 2023

Tutorial describing the most common single-cell datatypes, how to import data using EBI SCXA and HCA tools, and how to convert between the formats.

Can leave it as one big tutorial or split into separate smaller ones.

@nomadscientist

UPDATES AFTER THE REVIEW:

  • move the data import section from here to the separate tutorial
  • use downsampled AnnData input file
  • move CDS data conversion section from monocle tutorial into here
  • update histories, workflows and tests
  • add Seurat -> AnnData conversion (SCEasy)

@github-actions github-actions bot added the news label Jan 17, 2024
@wee-snufkin
Copy link
Collaborator Author

@wee-snufkin: if you would like to write a news post announcing your new tutorials or other updates, please feel free to always include that in the PR as well (but up to you to decide if that makes sense)

Thanks @shiltemann! I've added a wee news post to this PR and updated some of the contributions to tutorials :)

Copy link
Member

@pavanvidem pavanvidem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this tutorial is very much needed in the community. We are seeing now and then questions about getting data into Galaxy and converting. A very good start!!
I liked the concept of this tutorial but it would be nice to have some more alternative hands-ons for each type of conversion. It seems like the tutorial is built to work with other tutorials based on the EBI tools. Maybe some users just need to perform basic clustering with PBMC tutorial. Some of the conversions can be done with the latest SCEasy with a single click to produce the latest version of anndata that is compatible with the PBMC tutorial.
Currently, it seems like these are the only options to convert. Do you think it makes sense to add an alternative hands-on with SCEasy converter (with choose your tutorial)? Then we can produce the latest and old versions of anndata.

> 3. Alternatively, you can import history where we created the Seurat object: [Input history](https://singlecell.usegalaxy.eu/u/j.jakiela/h/ebi-scxa---anndata-scanpy-or-seurat-object-1)
>
> {% snippet faqs/galaxy/histories_import.md %}
>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, Galaxy detected this file as rds format. Hence, the SCEasy tool failed to detect the file as input. An additional step to assign datatype to rdata would be nice.
Edit: I saw that you added a new trick to run on unsupported file format

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting that - I can add a note for the users to change the datatype to rdata when importing a file in this way.


Most of our conversions involve extracting tables from different data objects and importing them into the target object.

First, we will extract observations (cell metadata) and the full matrix from our AnnData.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reasons to extract matrix and then use DropletUtils and Seurat Read10x tools? This can be done in a single step with the new SCEasy converter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the idea was to show the manual conversion to give the users an idea of the structure of those objects and how they are related.
The new SCEasy converter was introduced first in this tutorial, because it's the easiest to use and probably does all the conversion the users might need. I thought it's straightforward enough to mention it at the beginning and then just show other methods, but your comment gave me an idea of adding an additional hands-on box here that will show the one-step conversion with the new SCEasy converter.

>
{: .hands_on}

Finally, let's combine those files that we have just generated and turn them into the SingleCellExperiment!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, can be done with SCEasy. Anndata -> Seurat -> SCE

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, will add an additional hands-on box!

>
{: .hands_on}

And that's it! The latest tool {% tool [SCEasy Converter](toolshed.g2.bx.psu.edu/repos/iuc/sceasy_convert/sceasy_convert/0.0.7+galaxy2) %} will do the same but the output file will be the newest AnnData version and will not work with the tool used below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe good to have an alternative hands-on with the latest SCEasy converter tool too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mentioned the latest SCEasy here without a hands-on box because I thought it's similar enough to the old one that the users will know how to use it. And it would produce the AnnData output which wouldn't be compatible with the Filter, Plot, Explore workflow and in this section the main aim was to do the conversion and pass on the converted object into that workflow. As you pointed out previously, we need an older version of AnnData for that workflow, hence there it is the older version of SCEasy used here.


> <hands-on-title> Modify AnnData object to make it compatible with Filter, Plot, Explore workflow </hands-on-title>
>
> 1. {% tool [AnnData Operations](toolshed.g2.bx.psu.edu/repos/ebi-gxa/anndata_ops/anndata_ops/1.8.1+galaxy92) %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did it work for you? It failed for me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no!
It did work for me: https://singlecell.usegalaxy.eu/u/j.jakiela/h/seurat---anndata - see dataset 266 old SCEasy conversion, then 268 Anndata operations and all datasets that come later are the outputs of the Filter, Plot, Explore workflow.

@wee-snufkin
Copy link
Collaborator Author

Apologies for keeping updating this PR for so long, but I think I'm done!
All the tests pass the tutorial part (5debf55), linting only fails for the cyoa bit.

Copy link
Member

@hexylena hexylena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That linting error is fine 👍

@nomadscientist nomadscientist merged commit 67d35f7 into galaxyproject:main Feb 13, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants