-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for non-H5 inputs #38
Comments
I was about to post the same question. But I learned that you can use .mtx format as input, which might be easier to synthesize than .h5. See the example here: The only hiccup was that the genes/features file has to be named "genes.tsv", not "features.tsv". I am using STARsolo instead of CellRanger which uses "features.tsv". |
Yes, at the moment, the easiest approach is to try to get your data into the format of either CellRanger v2 or CellRanger v3, in their The .h5 file input expects the format to be exactly as CellRanger has it, so that's a bit more of a pain to pull off. The sparse The two CellRanger versions are a bit different (v2 has @c5creative We are interested in adding more compatibility for other file formats in the future. Could you point me to some documentation of the file specification for your InDrops format (and maybe a public example data file)? |
Just chiming in. I also used STARsolo. I had to manually changes features.tsv to genes.tsv and then it works. Alternatively, one an use DropletUtils to make h5 files. Either way, there is an intermediate step from the outputs of STARsolo to Cellbender. Good tool, btw. Really cleans up my data. |
Thanks to everyone's feedback! @sjfleming - With regards to the inDrop data - in our case, we originally used the inDrops pipeline for the data processing/filtering. I can share some output with you (it's large), but in short, it's a tab-delimited file with barcodes as rows and genes/features as columns: @achamess - great tip, I went back and reprocessed with STARsolo and now am making progress with CellBender |
We are currently adding functionality to read inputs from the DGE matrix format from dropseq, and if there's interest, we could also add a file parser for inDrop data as well. But glad to hear you've made progress. |
If there will be further input and output formats added h5ad might be good choice for both as well. |
Interesting point... it would require the user to have |
The The next commit will also add support for the DropSeq file format, which is a zipped dense count matrix in tabular form, much like the transpose of the Let me know if there is still desire for the |
Closed by #238 |
Greetings,
Am very excited to try this approach, but I can't seem to be able to get our data into it. Our data comes from the InDrops method. I did go through the trouble of passing the data through Seurat/LOOM to generate
.h5
files, which unfortunately does not seem compatible with CellBender (seeValueError: blocks must be 2-D
error, below).Is there any chance that you could introduce a more generic/accessible format that could be used as CellBender input? Ultimately, we all start with barcodes and genes. A sparse matrix would be convenient, for example.
Alternatively, if you know of a good way to load inDrops data into CellBender, then that would really make my day!
JP
The text was updated successfully, but these errors were encountered: