Support for non-H5 inputs #38

jpcartailler · 2019-10-22T17:11:08Z

Greetings,

Am very excited to try this approach, but I can't seem to be able to get our data into it. Our data comes from the InDrops method. I did go through the trouble of passing the data through Seurat/LOOM to generate .h5 files, which unfortunately does not seem compatible with CellBender (see ValueError: blocks must be 2-D error, below).

Is there any chance that you could introduce a more generic/accessible format that could be used as CellBender input? Ultimately, we all start with barcodes and genes. A sparse matrix would be convenient, for example.

Alternatively, if you know of a good way to load inDrops data into CellBender, then that would really make my day!

JP

cellbender:remove-background: Command:
cellbender remove-background --input data.KO_Gene_new.cells.h5ad --output output.h5 --cuda --expected-cells 500 --total-droplets-included 1000 --epochs 100
cellbender:remove-background: 2019-10-22 11:59:42
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from file data.KO_Gene_new.cells.h5ad
cellbender:remove-background: CellRanger v2 format
Traceback (most recent call last):
  File "C:\Users\c\AppData\Local\Continuum\miniconda3\envs\CellBender\Scripts\cellbender-script.py", line 11, in <module>
    load_entry_point('cellbender', 'console_scripts', 'cellbender')()
  File "c:\users\c\cellbender\cellbender\base_cli.py", line 101, in main
    cli_dict[args.tool].run(args)
  File "c:\users\c\cellbender\cellbender\remove_background\cli.py", line 92, in run
    main(args)
  File "c:\users\c\cellbender\cellbender\remove_background\cli.py", line 185, in main
    run_remove_background(args)
  File "c:\users\c\cellbender\cellbender\remove_background\cli.py", line 143, in run_remove_background
    args.low_count_threshold)
  File "c:\users\c\cellbender\cellbender\remove_background\data\dataset.py", line 82, in __init__
    self._load_data()
  File "c:\users\c\cellbender\cellbender\remove_background\data\dataset.py", line 125, in _load_data
    self.data = get_matrix_from_cellranger_h5(self.input_file)
  File "c:\users\c\cellbender\cellbender\remove_background\data\dataset.py", line 874, in get_matrix_from_cellranger_h5
    count_matrix = sp.vstack(csc_list, format='csc')
  File "C:\Users\c\AppData\Local\Continuum\miniconda3\envs\CellBender\lib\site-packages\scipy\sparse\construct.py", line 499, in vstack
    return bmat([[b] for b in blocks], format=format, dtype=dtype)
  File "C:\Users\c\AppData\Local\Continuum\miniconda3\envs\CellBender\lib\site-packages\scipy\sparse\construct.py", line 548, in bmat
    raise ValueError('blocks must be 2-D')
ValueError: blocks must be 2-D

The text was updated successfully, but these errors were encountered:

johnchamberlin · 2019-11-01T19:54:15Z

I was about to post the same question. But I learned that you can use .mtx format as input, which might be easier to synthesize than .h5. See the example here:
https://cellbender.readthedocs.io/en/latest/getting_started/remove_background/index.html

The only hiccup was that the genes/features file has to be named "genes.tsv", not "features.tsv". I am using STARsolo instead of CellRanger which uses "features.tsv".

sjfleming · 2019-11-06T18:04:44Z

Yes, at the moment, the easiest approach is to try to get your data into the format of either CellRanger v2 or CellRanger v3, in their mtx format.

The .h5 file input expects the format to be exactly as CellRanger has it, so that's a bit more of a pain to pull off. The sparse mtx and tsv format should work for you though.

The two CellRanger versions are a bit different (v2 has genes.tsv, v3 has features.tsv.gz), but if you can get your data into that fairly generic format, you can use the tool directly. Check out
https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/2.1/output/matrices
for details on formatting for CellRanger v2.

@c5creative We are interested in adding more compatibility for other file formats in the future. Could you point me to some documentation of the file specification for your InDrops format (and maybe a public example data file)?

achamess · 2019-11-24T16:33:19Z

Just chiming in. I also used STARsolo. I had to manually changes features.tsv to genes.tsv and then it works. Alternatively, one an use DropletUtils to make h5 files. Either way, there is an intermediate step from the outputs of STARsolo to Cellbender. Good tool, btw. Really cleans up my data.

jpcartailler · 2019-11-27T14:38:49Z

Thanks to everyone's feedback!

@sjfleming - With regards to the inDrop data - in our case, we originally used the inDrops pipeline for the data processing/filtering. I can share some output with you (it's large), but in short, it's a tab-delimited file with barcodes as rows and genes/features as columns:

Since one can provide mtx, I think that is sufficient enough for a generic means to load data.

@achamess - great tip, I went back and reprocessed with STARsolo and now am making progress with CellBender

sjfleming · 2020-01-07T16:26:20Z

We are currently adding functionality to read inputs from the DGE matrix format from dropseq, and if there's interest, we could also add a file parser for inDrop data as well. But glad to hear you've made progress.

Hrovatin · 2020-10-21T18:34:40Z

If there will be further input and output formats added h5ad might be good choice for both as well.

sjfleming · 2020-10-27T19:35:37Z

Interesting point... it would require the user to have anndata and h5py installed, but it could be doable...

sjfleming · 2021-05-03T18:44:05Z

The h5ad addition is now live thanks to @jacobkimmel

The next commit will also add support for the DropSeq file format, which is a zipped dense count matrix in tabular form, much like the transpose of the inDrop format above.

Let me know if there is still desire for the inDrop format you've shown @c5creative

sjfleming · 2023-08-08T18:59:05Z

Closed by #238

sjfleming self-assigned this Jul 29, 2020

sjfleming added this to the v0.2 milestone Jul 29, 2020

sjfleming added the enhancement New feature or improvement label Aug 19, 2020

sjfleming modified the milestones: v0.2, v0.1, v0.2.1 May 3, 2021

sjfleming mentioned this issue Mar 28, 2023

v0.3.0 #189

Closed

sjfleming mentioned this issue Aug 6, 2023

v0.3.0 #238

Merged

sjfleming closed this as completed Aug 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for non-H5 inputs #38

Support for non-H5 inputs #38

jpcartailler commented Oct 22, 2019

johnchamberlin commented Nov 1, 2019

sjfleming commented Nov 6, 2019

achamess commented Nov 24, 2019

jpcartailler commented Nov 27, 2019

sjfleming commented Jan 7, 2020

Hrovatin commented Oct 21, 2020

sjfleming commented Oct 27, 2020

sjfleming commented May 3, 2021

sjfleming commented Aug 8, 2023

Support for non-H5 inputs #38

Support for non-H5 inputs #38

Comments

jpcartailler commented Oct 22, 2019

johnchamberlin commented Nov 1, 2019

sjfleming commented Nov 6, 2019

achamess commented Nov 24, 2019

jpcartailler commented Nov 27, 2019

sjfleming commented Jan 7, 2020

Hrovatin commented Oct 21, 2020

sjfleming commented Oct 27, 2020

sjfleming commented May 3, 2021

sjfleming commented Aug 8, 2023