-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I use output from cellranger-arc in cellbender ? #121
Comments
I'm not part of the cellbender team, but yes you can. I've done it before and the program works just fine. Just out of curiosity, do you plan to remove the ATAC peaks from the raw matrix before running cellbender or leave them in? I've been leaving them in, but I would be curious how removing them affects filtering and downstream processing. |
ok thanks. I was able to run cellbender. I did not remove the ATAC peaks. I did not get any errors. |
Sorry, that's a question for the dev team. My files usually end up larger, so I wouldn't be too worried about it unless you see something really strange in the data. |
This is a great catch @hemantgujar and @mhulke ... it took me a very long time to notice this for some reason. But I finally figured it out. The cellranger people apparently spent a lot of time thinking about and optimizing the compression of their HDF5 files, it turns out! And I did not... but my next update will fix this issue, by compressing the HDF5 files in the same optimized way that cellranger does. Long story short, the fact that the outputs are so much larger than the inputs is the result of sub-optimal compression by me, but it doesn't affect the data in any way. The output files do contain a few more things, like the per-cell embeddings and cell probabilities and some more latent variables. But these are pretty small in terms of file size, and they do not account for the big discrepancy. Once this fix is put in place, the outputs will be smaller files than the inputs. |
As for the cellranger-arc ATAC data, I will have to look into that. I am glad to hear the code runs without error, but I wonder if it is altering the ATAC data? It looks like |
Hello, there |
Hello! |
To avoid having ATAC peaks mess with the Cellbender output, I usually feed my scRNA data through Cellranger separately from Cellranger-arc, and then use that h5 file in Cellbender. Afterward, pairing up the RNA and ATAC data using the barcodes downstream is fast and easy. You could also remove the peak data from the arc h5 before feeding it into Cellbender. |
@smilesw , sorry for the delay, somehow I missed your comment. With my next update, I will make this much easier by allowing users to choose which features they want to include in the analysis. |
@vincycheng in your case it sounds like cellbender's cell calling might not have worked very well on your dataset. Sometimes a sub-optimal run results in cellbender saying "all droplets contain cells" when really they don't. If you can include a screenshot of the output PDF, maybe I can make a few suggestions about how to modify the parameters. It could be the case that the ATAC peak information threw off cellbender... I have not done testing myself to know if this is a problem. If you wanted to try removing the ATAC data, you could do either as @mhulke suggested, or you could try to use an h5ad file and manually remove the ATAC features, as I suggested in the previous post. It's probably worth a try! |
Thank you @sjfleming for your reply. I did removed the ATAC peak information and ran cellbender. The result came out a bit more "reasonable". ~30,000 cells were being called. There seems to be some other issues with this dataset, but I will open another discussion once I gather more information. Thanks again! |
Hi @sjfleming, I had the same issue with cellbender (default parameters and cellranger-arc estimated cells) outputting much more cells than the original cell ranger-arc output. And I did remove peak features and saved data into an additional |
Hi @jingxinfu ! I have maybe a few comments on these three runs for which you attached output PDFs:
|
Thanks for your suggestions! @sjfleming I will test it out for the first two samples. And for the third sample, I found that CellBender tends to keep more mitochondrial-associated UMIs when comparing the UMIs changes between CellRanger ARC and CellBender. |
Hi @jingxinfu , for that third sample, if you mean that CellBender is identifying more droplets as "cells" and that those extra "cells" compared to CellRanger ARC tend to have more mitochondrial gene expression, then I think you're seeing something very common. Let me explain: CellBender will tend to keep everything that is a "non-empty" droplet. We usually refer to these as "cells" and say it's the "cell probability"... but what it really is, is the probability that a droplet is "not empty". It doesn't actually mean that it contains a high quality cell. We do urge people to perform cell quality control QC after running CellBender, in order to eliminate precisely those kinds of things: debris and dead/dying cells that might have super high mitochondrial gene counts. Here's an example from the public 10x Genomics human-mouse mixture dataset: In the plot above, CellBender has identified all droplets that are not gray color as "non-empty". The gray droplets have been identified by CellBender as "empty". I believe this is actually the correct outcome. But, you can see that the "arms" in the plot with high mitochondrial read fraction (red) are clearly something very different from the "good quality cells" that have low mitochondrial read fraction. So, even though CellBender (correctly) identifies those (red) droplets as "non-empty", they should probably be filtered out by a cell QC step before using the data in downstream analyses, since those high mitochondrial count droplets are possibly(??) not what you'd be interested in for further analysis. |
That's good to know. Thank you so much for the explanation in detail! |
Hi @sjfleming! Following the use of CellBender with the h5 generated by Thanks in advance! |
Hi @ccruizm , that feature is not yet implemented in an official release. But I will make sure to include it in v0.3.0, which I anticipate releasing by March 10. My development version of that feature is the argument The command would be something like
|
The idea would be that, in the output file, the |
Thanks for the speedy reply! I ran both h5 containing the peaks and without them and saw differences as others reported. But having an h5 output also preserving the ATAC information is better! Going to test it now! 🤓 |
I haven't tested it yet myself, so please do let me know if it works / doesn't! Hopefully I can make any adjustments and get it working. Out of curiosity, how would you describe the differences you saw when including / excluding the |
Thank you so much for running that comprehensive test! Shoot, okay. I am going to have to do some debugging here. The fact that the code runs fine when you manually subset to RNA means that there is a bug somewhere in my exclusion of features from the analysis. I will track it down! |
Any chance you could run it with the flag
|
Sure! here it is :) |
Thanks @ccruizm ! Unfortunately I will need to do more digging to track this down, but I will keep you posted. I will make sure it's working for the v0.3.0 release |
Hi, I ran into the same issue of trying to ignore "Peaks" from Cellranger-arc multiomic data. Perhaps a quick way to do this currently is to list all the ATAC-feature indices (they are usually listed after all the genes in the Cellranger output features.tsv.gz) and supply that list to the --blacklist-genes option? Would this work? Best, Kunal |
Or is there a fix available now? |
@kunalkathuria I think that your suggestion would work (potentially!). But let me know if you get a chance to try running the code on the |
Closed by #238 |
Can I use output from cellranger-arc in cellbender ?
The text was updated successfully, but these errors were encountered: