input a raw feature count but the qc html shows UMI counts as if it were a filtered feature count #140

JiajiaChen1 · 2022-07-20T14:56:54Z

Thank you very much for developing and sharing this very helpful tool!

Recently when I ran CellBender on a 10X scRNA-seq sample it output the following cell probability plot which seems to only contain a subsets of cells (comparing with the UMI counts range from CellRanger qc report; CellRanger estimated 18,230 cells in the sample).But i did not do any filtering and am sure that the input is the raw_feature_bc_matrix .For this run I used default expected_cell and low_count_ threshold.From the following log it seems unable to detect empty droplets - does it mean there is no empty droplets in the dataset? If that is the case I would just expect to see cell probability =1 in the UMI count plot, instead of this "truncated" plot.

Curious to see what are your thoughts on this matter. Thanks for your time!

Jiajia

cellbender:remove-background: Including 22222 genes that have nonzero counts.
cellbender:remove-background: Prior on counts in empty droplets is 2696
cellbender:remove-background: Prior on counts for cells is 15646
cellbender:remove-background: Excluding barcodes with counts below 1348
cellbender:remove-background: Using 1263 probable cell barcodes, plus an additional 16364 barcodes, and 0 empty droplets.
cellbender:remove-background: Largest surely-empty droplet has 1348 UMI counts.

CellRanger Barcode Rank Plot

The text was updated successfully, but these errors were encountered:

sjfleming · 2022-07-26T13:55:31Z

Hi @JiajiaChen1 ,

Hopefully we can get this working. So, in the UMI curve you attached from CellRanger, I wonder what you think about the CellRanger cell calls. Do you believe that there are about 18k cells? And then the empty droplets have around 10 counts each? If so, this would be a very very clean dataset.

The other possibility I would think is that there are more like 3000 cells in the dataset (that first little bump), and then the empty droplets have about 2000-3000 UMI counts each. This would be an extremely noisy dataset.

Which do you think is more likely in your case?

If you think that this is a very clean dataset with around 18k cells, then I think the best thing to do would be to set --low count threshold 5 to help CellBender find the empty droplets correctly. (The default --low-count-threshold is 15, so CellBender is missing that empty droplet plateau which is around 10 counts.). If CellBender default still tells you that there are only a few thousand "probable cell barcodes", and not something closer to 18k, then I would additionally try to set --expected-cells to maybe 17000.

sjfleming · 2023-08-08T19:28:30Z

Will close for now. Feel free to re-open.

This kind of issue will probably be greatly improved in v0.3.0
Closed by #238

sjfleming self-assigned this Aug 8, 2023

sjfleming added the user question User question about a specific dataset label Aug 8, 2023

sjfleming closed this as completed Aug 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input a raw feature count but the qc html shows UMI counts as if it were a filtered feature count #140

input a raw feature count but the qc html shows UMI counts as if it were a filtered feature count #140

JiajiaChen1 commented Jul 20, 2022

sjfleming commented Jul 26, 2022

sjfleming commented Aug 8, 2023

input a raw feature count but the qc html shows UMI counts as if it were a filtered feature count #140

input a raw feature count but the qc html shows UMI counts as if it were a filtered feature count #140

Comments

JiajiaChen1 commented Jul 20, 2022

sjfleming commented Jul 26, 2022

sjfleming commented Aug 8, 2023