what to put in expected cells and total droplets? #137

dm8000 · 2022-07-14T18:16:48Z

Hello

In our experiment, we put loaded 10k cells. After running the data on cellranger, ~5k cells were counted. So I'm not sure what to put on expected cells neither total droplets.

sjfleming · 2022-07-15T21:36:04Z

Hi @dm8000 ,

There are some general recommendations here
https://cellbender.readthedocs.io/en/latest/usage/index.html#recommended-best-practices

If you loaded 10k cells, then you should be able to use --expected-cells 10000 just fine. If the UMI curve looks more like there are 5k cells in reality, then it's also fine to use --expected-cells 5000. The algorithm should also be able to come up with a reasonable default if you don't specify anything for expected-cells (though if the dataset is really challenging, it might struggle).

As far as total droplets, that will be the total number of droplets that cellbender analyzes. So for all the analyzed droplets, cellbender determines a probability that the given droplet contains a cell. For all the droplets that are NOT among the analyzed droplets, cellbender assumes they are empty. So if you make --total-droplets-included too small, and there are some cells past that number of droplets on the UMI curve, then you will be giving cellbender a bad idea about a prior on empty droplet gene expression. I usually try to look at the UMI curve and try to pick a number where it looks like the droplets are "surely empty". Past all the cells, but you don't have to go too far into the "empty droplet plateau". Depending on the dataset, 20k or 30k is often enough. For the 10x genomics pbmc8k dataset (v2 chemistry), 12k is enough (https://cf.10xgenomics.com/samples/cell-exp/2.1.0/pbmc8k/pbmc8k_web_summary.html). It depends on what the UMI curve looks like.

You can also not specify --total-droplets-included and let cellbender try to use a default.

Feel free to ask any follow-up questions.

The long-term goal here is to improve cellbender's heuristics for auto-finding these values until they're good enough that users hardly ever need to input the values themselves. We are not quite there yet at this point.

sjfleming · 2023-08-08T19:26:28Z

Finally achieved some progress on improving those automatic heuristics. The command can now pretty reliably be run without specifying anything for --expected-cells or --total-droplets-included if you don't want to.

Closed by #238

sjfleming closed this as completed Aug 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what to put in expected cells and total droplets? #137

what to put in expected cells and total droplets? #137

dm8000 commented Jul 14, 2022

sjfleming commented Jul 15, 2022

sjfleming commented Aug 8, 2023

what to put in expected cells and total droplets? #137

what to put in expected cells and total droplets? #137

Comments

dm8000 commented Jul 14, 2022

sjfleming commented Jul 15, 2022

sjfleming commented Aug 8, 2023