Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what to put in expected cells and total droplets? #137

Closed
dm8000 opened this issue Jul 14, 2022 · 2 comments
Closed

what to put in expected cells and total droplets? #137

dm8000 opened this issue Jul 14, 2022 · 2 comments

Comments

@dm8000
Copy link

dm8000 commented Jul 14, 2022

Hello

In our experiment, we put loaded 10k cells. After running the data on cellranger, ~5k cells were counted. So I'm not sure what to put on expected cells neither total droplets.

@sjfleming
Copy link
Member

Hi @dm8000 ,

There are some general recommendations here
https://cellbender.readthedocs.io/en/latest/usage/index.html#recommended-best-practices

If you loaded 10k cells, then you should be able to use --expected-cells 10000 just fine. If the UMI curve looks more like there are 5k cells in reality, then it's also fine to use --expected-cells 5000. The algorithm should also be able to come up with a reasonable default if you don't specify anything for expected-cells (though if the dataset is really challenging, it might struggle).

As far as total droplets, that will be the total number of droplets that cellbender analyzes. So for all the analyzed droplets, cellbender determines a probability that the given droplet contains a cell. For all the droplets that are NOT among the analyzed droplets, cellbender assumes they are empty. So if you make --total-droplets-included too small, and there are some cells past that number of droplets on the UMI curve, then you will be giving cellbender a bad idea about a prior on empty droplet gene expression. I usually try to look at the UMI curve and try to pick a number where it looks like the droplets are "surely empty". Past all the cells, but you don't have to go too far into the "empty droplet plateau". Depending on the dataset, 20k or 30k is often enough. For the 10x genomics pbmc8k dataset (v2 chemistry), 12k is enough (https://cf.10xgenomics.com/samples/cell-exp/2.1.0/pbmc8k/pbmc8k_web_summary.html). It depends on what the UMI curve looks like.

You can also not specify --total-droplets-included and let cellbender try to use a default.

Feel free to ask any follow-up questions.

The long-term goal here is to improve cellbender's heuristics for auto-finding these values until they're good enough that users hardly ever need to input the values themselves. We are not quite there yet at this point.

@sjfleming
Copy link
Member

Finally achieved some progress on improving those automatic heuristics. The command can now pretty reliably be run without specifying anything for --expected-cells or --total-droplets-included if you don't want to.

Closed by #238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants