-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cell probabilities remain close to one #220
Comments
I am not a developer, but based on the graph you show, it seems like the algorithm does has not enough droplets to reach the empty plateau, it stops at 25k total droplets. Try going higher with the |
Hi @leprohonmalo , So, from looking at this sample's UMI curve (the top plot), I have kind of a hard time knowing exactly what's going on with the experiment. There are two possible scenarios:
Evidence for scenario (1):
Evidence against scenario (1):
Evidence for scenario (2):
Evidence against scenario (2):
Are you measuring just RNA here, or are there antibody features or ATAC features as well? |
Hi @sjfleming, Thank you very much for your kind help. To answer some of your questions : Did you not use all the 10x GEM barcoded beads ? Do you expect to see two populations of cells, maybe 3k of one type and 7k of another type, where the first cell type has 5k+ UMI counts and the second cell type has only 500 UMI counts? Is this all identical T cells, or are there other things in here like granulocytes, which can have very low UMI counts like that? The amount of ambient RNA would be very low, suggesting this was a very clean experiment. Is that consistent with your expectations? Are you measuring just RNA here, or are there antibody features or ATAC features as well?
I did not go further. One thing I though was there could be around 3000 viable cells then around 7000 droplets with cell debris and then an empty droplet plateau after 10k. However I would maybe expect more than 10 UMI in empty droplets if that was the case. I also have no idea how much stress a cell get while going through thawing, magnetic sorting, labeling... Is it enough to trigger some apoptosis after we checked the viability ? After what you said I would lean more towards scenario two. Comparing with mice samples I have with obvious background and an empty droplet plateau just below 100 I do not see a lot of background expression when looking at genes like HBB (although different organism is not optimal to compare). I also think it is more likely to have 2 different populations rather than few empty droplets. I will try to see if I can isolate this low UMI count population and see how it looks. Considering the two populations also make the number of cells closer to what we should expect (although I have also other samples with low amounts of cells). Thank you again for you help. The documentation for cellbender is already quite helpfull (although I could suggest to add some example from github issues to help with troubleshooting) and you have been very active on github issues which helps a lot in trying to understand what is happenning. I really appreciate that. |
Hi @leprohonmalo , thanks for your comments, glad to hear those kind things. I think I tend to agree with your assessment after hearing your answers. If this was a "normal" 10x run with PBMCs, then I would expect to see an empty droplet plateau going out to around 70k droplets or so (which is what you see), and from blood with PBMC isolation, I'd also expect the experiment to be pretty clean: I would not expect to see 500 UMI counts in empty droplets (which is what I usually see from very difficult snRNA-seq preps from tough human tissues like heart or blood vessel walls). (Another thing you might be able to check is this: in the raw data, if you look at droplets from the 500 UMI count plateau, do they express new genes that are NOT present in the droplets with 2000+ counts? If the 500 count plateau was really empty droplets, you would not expect to see new genes that didn't come from the cells in the experiment.) So if scenario 2 is the more likely, then I would recommend trying to run cellbender as follows:
where The |
Hi @sjfleming, I looked at the 500 UMI population and it is definitly not a second population of cells as it displays mostly expression of mitochondrial genes (around 50 %). So I was wrong and it seems that scenario 1 is much more likely. What would you advise in this case ? |
Well, I guess I would not be so sure that the high mitochondrial reads is evidence for scenario 1. In fact, if 50% mitochondrial reads is higher than the fraction in the really good cells, then I would suspect that the 500 UMI plateau is NOT empty droplets... precisely because there's something different going on in those 500 UMI count droplets than in the cells. It's kind of like this sort of scenario It's hard to say what those droplets actually represent. In my mind, they could be either dying cells (which sometimes have a high mitochondrial read fraction) or they could be some kind of debris-containing droplets that for some reason contain a lot of mitochondria. But I think, for the purposes of CellBender's model, they are not empty. So I would think scenario 2 still applies. |
Thanks for the explanation.
However the ELBO plot is not perfect, do you think that I can improve it with parameters like learning-rate ? Is it fine to go on with it ? I would say that previous epochs still converged. |
So I can tell you what's happening here... and it's something I'm currently struggling to fix. The turnaround in the learning curve is due to those initial few droplets with the highest counts getting called "empty" by cellbender. This is something that ideally should NOT happen. And I usually see a kind of dip in the learning curve when it does happen. I'm working on incorporating some more regularization to make sure it doesn't happen in future versions. (The output -- apart from those few high count droplets -- should be fine though. ...apart from the handful of high-count droplets that are actually being called "empty": those droplets will have zero counts in the output, which is probably not desirable... though those droplets probably represent doublets anyway... but still it would be better if cellbender would keep them.) For now, the options would be (1) to reduce the learning rate, maybe |
Hi, CART_P88_cellbender_50k_25k_5_2e5.pdf Thank you very much for your help and your explanations. I hope I will be able to run the rest of my samples now. I think I'm good with closing the issue. |
Hi,
Thank you for developing this tool ! I am trying CellBender (using latest docker image) on several human samples containing mostly T cells. We loaded 20 000 cells on the chip aiming to recover 7000-10 000.
For many samples (3/4) I get similar results where all droplets have a cell probabilities close to one.
I runned my sample with the following code :
I saw similar issues suggesting to adjust parameters such as :
--expected-cells
(tested 3000; 5000; 10 000)--total-droplets-included
(tested from 10 000 to 40 000)--low-count-threshold
(tested 5; 15)--empty-drop-training-fraction
(tested 0.3; 0.5; 0.7)--z-dim
(tested 50; 100)--z-layers
(tested 250; 500)However I could not manage to make it work.
What could I do to improve the results ? Is there any issue that I am not seeing (for this specific sample the barcode rank plot maybe has an unusual shape and I recover a low amount of cells but that is not applicable to all of my samples) ?
Thank you in advance for your help. Tell me if I can add any useful information.
The text was updated successfully, but these errors were encountered: