Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input a raw feature count but the qc html shows UMI counts as if it were a filtered feature count #140

Closed
JiajiaChen1 opened this issue Jul 20, 2022 · 2 comments
Assignees
Labels
user question User question about a specific dataset

Comments

@JiajiaChen1
Copy link

Hi sjfleming,

Thank you very much for developing and sharing this very helpful tool!

Recently when I ran CellBender on a 10X scRNA-seq sample it output the following cell probability plot which seems to only contain a subsets of cells (comparing with the UMI counts range from CellRanger qc report; CellRanger estimated 18,230 cells in the sample).But i did not do any filtering and am sure that the input is the raw_feature_bc_matrix .For this run I used default expected_cell and low_count_ threshold.From the following log it seems unable to detect empty droplets - does it mean there is no empty droplets in the dataset? If that is the case I would just expect to see cell probability =1 in the UMI count plot, instead of this "truncated" plot.

Curious to see what are your thoughts on this matter. Thanks for your time!

Jiajia

cellbender:remove-background: Including 22222 genes that have nonzero counts.
cellbender:remove-background: Prior on counts in empty droplets is 2696
cellbender:remove-background: Prior on counts for cells is 15646
cellbender:remove-background: Excluding barcodes with counts below 1348
cellbender:remove-background: Using 1263 probable cell barcodes, plus an additional 16364 barcodes, and 0 empty droplets.
cellbender:remove-background: Largest surely-empty droplet has 1348 UMI counts.

Screen Shot 2022-07-20 at 9 06 53 AM

CellRanger Barcode Rank Plot
Screen Shot 2022-07-20 at 9 14 29 AM

@sjfleming
Copy link
Member

Hi @JiajiaChen1 ,

Hopefully we can get this working. So, in the UMI curve you attached from CellRanger, I wonder what you think about the CellRanger cell calls. Do you believe that there are about 18k cells? And then the empty droplets have around 10 counts each? If so, this would be a very very clean dataset.

The other possibility I would think is that there are more like 3000 cells in the dataset (that first little bump), and then the empty droplets have about 2000-3000 UMI counts each. This would be an extremely noisy dataset.

Which do you think is more likely in your case?

If you think that this is a very clean dataset with around 18k cells, then I think the best thing to do would be to set --low count threshold 5 to help CellBender find the empty droplets correctly. (The default --low-count-threshold is 15, so CellBender is missing that empty droplet plateau which is around 10 counts.). If CellBender default still tells you that there are only a few thousand "probable cell barcodes", and not something closer to 18k, then I would additionally try to set --expected-cells to maybe 17000.

@sjfleming sjfleming self-assigned this Aug 8, 2023
@sjfleming sjfleming added the user question User question about a specific dataset label Aug 8, 2023
@sjfleming
Copy link
Member

Will close for now. Feel free to re-open.

This kind of issue will probably be greatly improved in v0.3.0
Closed by #238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user question User question about a specific dataset
Projects
None yet
Development

No branches or pull requests

2 participants