-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How are priors calculated? #147
Comments
Basically, prior on empty droplets somehow becomes more than that on cells. Here's the log:
|
Hi @apredeus , yes these priors are currently computed using some heuristics that honestly I think are not very robust. Your dataset seems to be an example. The current heuristic for finding empty droplet size (after removing droplets with UMI count below CellBender/cellbender/remove_background/data/dataset.py Lines 1436 to 1438 in cb2d209
I am definitely working to improve the prior calculation heuristics. But in the mean time I might be able to give a bit more advice if you can post a picture of the UMI curve. |
Great, thank you very much - this clarifies a lot. I'm putting together a database of some 10,000 scRNA-seq datasets at Sanger, and I'd like to use Cellbender on all of them. From what we find so far, about 10-20% of runs are problematic with current priors (so far I use |
Hi @apredeus sorry for the slow reply, and that sounds fantastic! Very glad to hear you'll be running CellBender on your samples. That's a massive database. I would very much like to get that fraction of problematic runs reduced way below the 10-20% you're currently seeing. I know that makes things painful when you've got a lot of samples. I'll keep you updated about new presets for priors. If you notice any trends, like what you mentioned above happening repeatedly
I'd love to know |
This changed a bit in v0.3.0 The prior-finding logic is now somewhat compartmentalized here But additionally, there are now input arguments Closed by #238 |
Hi,
Could you please comment on how exactly are priors on empty droplets and cells calculated?
I have an unusual dataset with a very "steep" cell curve - first ~5k cells have high UMIs, and then there's a pretty sharp drop. For some reason, prior on empty droplets is huge, although I'd actually expect it to be rather low? Changing
--low-count-threshold
does nothing.Thank you in advance!
The text was updated successfully, but these errors were encountered: