-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cell probabilities' output interpretation #151
Comments
Hi @learning-MD , sorry for the late response. Looks like a cool dataset! Very nice. You should be able to use cellbender if you want to (it just won't remove much), so we should be able to get it working. Your interpretation is correct: cellbender did not really work on your dataset, since it thought everything was a cell. My guess is that the problem was that cellbender had trouble finding the "ambient" empty droplet plateau. I'm not sure why the ambient plateau is so small in this dataset. If I were you, I might try setting Let me know if that doesn't work! |
@sjfleming - Stephen, thanks for the recommendations. When I ran with
Rather than reducing the learning rate, I tried what you suggested with It looks more consistent with what I'd expect, but not sure how to interpret that error with the first recommendation you had. Additionally, the ELBO plot looks funkier that what I've typically seen. The input was the raw .h5 file that CellRanger spit out. It does include both gene expression and multiplexing data, so not sure if that had any influence. Any insight/suggestions you have would be appreciated. Thanks! |
Yikes, that first error is not something I like to see. Looks like it ran into some kind of numerical instability. Not sure why that happened... The ELBO does look a bit strange on the training set. When you say gene expression and multiplexing data, what is the multiplexing data part? Are we talking about antibody capture or ATAC data, or what are those extra features? How many features are there in addition to gene expression? |
@sjfleming These were just hashtag oligos (TotalSeq C), so not CITE-seq. We did 5' scRNA-seq + HTOs to multiplex 5 samples into one lane and targeting ~4000-6000 cells per individual sample. So, total of 5 different antibody tags. Not sure if that's helps or not. |
Okay I see, so the count matrix is mainly just gene expression features (plus the 5 TotalSeq_C features). That seems very reasonable. Maybe I could suggest two other things to try:
|
@sjfleming - apologies for the delay. I dropped zdim and z-layers to their defaults (100 and 500, respectively) while keeping the low count at 5 (this is where I previously had errors). Below is what I did:
Below is the QC of the output. I feel that this looks a lot more like what I expect the results to look like and am relatively happy with it. It looks like decreasing the z-dim and z-layers provided more stability. Curious to get your thoughts as well. Thanks! |
Hi @learning-MD , well it looked like the first plot you sent was pretty encouraging. I'd say that run looks just fine. But yeah, that last plot looks like cellbender was unable to locate the empty droplets appropriately. The learning curve actually looks fine to me, I think that's okay. But it'd be nice if it got the empty droplets right. The only thing I see in that last run that you might be able to try is to use more I need to make a few changes to cellbender so that tweaks like this become unnecessary, but currently, I sometimes see the effect you're seeing (all droplets are being called cells) get corrected if more droplets are included in the analysis. |
@sjfleming Thanks! I extended the total droplets to 80000 instead: Seems like maybe extending the total droplets may be the way to proceed with the hashed samples where CellBender calls everything a cell. The learning curve looks okay, but the test curve seems to dip slightly. Not sure what to make of it, but I think this is a successful run? Thanks. |
Okay excellent! Yes that definitely counts as a successful run. In my experience sometimes the test ELBO can meander around a little bit like that, and it's nothing to worry about. Great! |
Some aspects of training have been tweaked in v0.3.0 in such a way as to make the learning curve more reliable and hopefully mitigate these kinds of issues in the future. Closed by #238 |
Hi,
Thank you for this great tool! It's been very useful in single-nucleus RNA-seq analyses in the past for me. I'm trying it out on a hashed PBMC dataset currently (5 samples hashed together, aiming for ~5000 cells/individual sample). In general, without using Cellbender, the QC of the sample looks well and able to visualize clear distinct clusters with known PBMC markers. There's not much of an ambient plateau in the CellRanger 7.0 output:
With that in mind and expecting minimal background contamination, I ran CellBender with the following code:
When interrogating the output, it looks like CellBender is overcalling cells:
Am I interpreting that correctly? That there truly is minimal background contamination in this hashed sample? It's a very different looking output in comparison to working with solid tissue and doing single-nucleus RNA-seq. When I perform downstream analysis of this PBMC output in Seurat, I have ~23,000 singlets for the 5 hashed samples that cluster as follows:
So, I do not think there's any QC failure on the wet lab or library prep side of things. Any suggestions for improvement would be greatly appreciated. Thanks!
Edit: If it helps, this is a superloaded experiment. With downstream QC, I removed ~9000 doublets and the total number of singlets was ~23000. The goal was to target ~4000-6000 cells per hashtagged sample.
The text was updated successfully, but these errors were encountered: