Skip to content

Commit

Permalink
Updated vignette to use listDatasets.
Browse files Browse the repository at this point in the history
Removed unnecessary 10X commentary.
  • Loading branch information
LTLA committed Jun 7, 2020
1 parent 2410cfe commit 1d1d237
Showing 1 changed file with 28 additions and 53 deletions.
81 changes: 28 additions & 53 deletions vignettes/scRNAseq.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -39,54 +39,34 @@ for further information on how to work with `SingleCellExperiment` objects.

# Available data sets

The available data sets can be split into two categories.
The first category contains expression matrices that have been generated by the `r Biocpkg("scRNAseq")` authors
from the raw sequencing data for each experiment.
This includes:

- `ReprocessedFluidigmData()` provides 65 cells from @pollen2014lowcoverage.
- `ReprocessedTh2Data()` provides 96 T helper cells from @mahata2014singlecell.
- `ReprocessedAllenData()` provides 379 cells from the mouse visual cortex,
which is a subset of the data from @tasic2016adult.

The second category contains expression matrices that were provided by the authors of each study.
No further reprocessing has been performed other than some cross-checks betweeh the count matrix and the sample metadata.

| Study | System | Number of cells | Function |
|-----------------------|------------------|-------------------|---------------------|
|@aztekin2019identification | Xenopus tail | 13199 | `AztekinTailData()` |
|@bach2017differentiation | Mouse mammary gland | 25806 | `BachMammaryData()` |
|@baron2016singlecell | Human pancreas | 8569 | `BaronPancreasData('human')` |
|@baron2016singlecell | Mouse pancreas | 1886 | `BaronPancreasData('mouse')` |
|@buettner2015computational | Mouse embryonic stem cells | 288 | `BuettnerESCData()` |
|@campbell2017molecular | Mouse brain | 21086 | `CampbellBrainData()` |
|@chen2017singlecell | Mouse brain | 14437 | `ChenBrainData()` |
|@grun2016denovo | Mouse haematopoietic stem cells | 1915 | `GrunHSCData()` |
|@grun2016denovo | Human pancreas | 1728 | `GrunPancreasData()` |
|@kolodziejczyk2015singlecell | Mouse mebryonic stem cells | 704 | `KolodziejczykESCData()` |
|@lamanno2016molecular | Human embryonic stem cells | 1715 | `LaMannoBrainData('human-es')` |
|@lamanno2016molecular | Human embryonic midbrain | 1977 | `LaMannoBrainData('human-embryo')` |
|@lamanno2016molecular | Human induced pluripotent stem cells | 337 | `LaMannoBrainData('human-ips')` |
|@lamanno2016molecular | Mouse adult dopaminergic neurons | 243 | `LaMannoBrainData('mouse-adult')` |
|@lamanno2016molecular | Human embyronic midbrain | 1907 | `LaMannoBrainData('mouse-embryo')` |
|@lawlor2017singlecell | Human pancreas | 638 | `LawlorPancreasData()` |
|@leng2015oscope | Human embryonic stem cells | 460 | `LengESCData()` |
|@lun2017assessing | 416B cells | 192 | `LunSpikeInData('416b')` |
|@lun2017assessing | Mouse trophoblasts | 192 | `LunSpikeInData('tropho')` |
|@macosko2015highly | Mouse retina | 49300 | `MacoskoRetinaData()` |
|@marques2016oligodendrocyte | Mouse brain | 5069 | `MarquesBrainData()` |
|@messmer2019transcriptional | Human embryonic stem cells | 1344 | `MessmerESCData()` |
|@muraro2016singlecell | Human pancreas | 3072 | `MuraroPancreasData()` |
|@nestorowa2016singlecell | Mouse haematopoietic stem cells | 1920 | `NestorowaHSCData()` |
|@paul2015transcriptional | Mouse haematopoietic stem cells | 10368 | `PaulHSCData()` |
|@richard2018tcell | Mouse CD8+ T cells | 572 | `RichardTCellData()` |
|@romanov2017molecular | Mouse brain | 2881 | `RomanovBrainData()` |
|@segerstolpe2016single | Human pancreas | 3514 | `SegerstolpePancreasData()` |
|@shekhar2016comprehensive | Mouse retina | 44994 | `ShekharRetinaData()` |
|@usoskin2015unbiased | Mouse brain | 864 | `UsoskinBrainData()` |
|@tasic2016adult | Mouse brain | 1809 | `TasicBrainData()` |
|@xin2016rna | Human pancreas | 1600 | `XinPancreasData()` |
|@zeisel2015brain | Mouse brain | 3005 | `ZeiselBrainData()` |
The `listDatasets()` function returns all available datasets in `r Biocpkg("scRNAseq")`,
along with some summary statistics and the necessary R command to load them.

```{r}
out <- listDatasets()
```

```{r, echo=FALSE}
out <- as.data.frame(out)
out$Taxonomy <- c(`10090`="Mouse", `9606`="Human", `8355`="Xenopus")[as.character(out$Taxonomy)]
out$Call <- sprintf("`%s`", out$Call)
knitr::kable(out)
```

If the original dataset was not provided with Ensembl annotation, we can map the identifiers with `ensembl=TRUE`.
Any genes without a corresponding Ensembl identifier is discarded from the dataset.

```{r}
sce <- ZeiselBrainData(ensembl=TRUE)
head(rownames(sce))
```

Functions also have a `location=TRUE` argument that loads in the gene coordinates.

```{r}
sce <- ZeiselBrainData(ensembl=TRUE, location=TRUE)
head(rowRanges(sce))
```

# Adding new data sets

Expand All @@ -109,9 +89,4 @@ and creates a `SingleCellExperiment` object.
Potential contributors are recommended to examine some of the existing scripts in the package to pick up the coding conventions.
Remember, we're more likely to accept a contribution if it's indistinguishable from something we might have written ourselves!

As a general rule, 10X Genomics data sets are not suitable for inclusion in this package.
They are either easy to load (e.g., with functions from the `r Biocpkg("DropletUtils")` package),
or they are more appropriately obtained with dedicated 10X packages like `r Biocpkg("TENxPBMCData")` or `r Biocpkg("TENxBrainData")`.
That said, inclusion will be considered if the format has been sufficiently customized by the original authors.

# References

0 comments on commit 1d1d237

Please sign in to comment.