Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncated Cell Barcodes #200

Closed
ayyildizd opened this issue Apr 19, 2023 · 5 comments
Closed

Truncated Cell Barcodes #200

ayyildizd opened this issue Apr 19, 2023 · 5 comments
Assignees
Labels
enhancement New feature or improvement
Milestone

Comments

@ayyildizd
Copy link

Hi and thanks for developing this great tool! (looking forward to v3)

I am using Cell Bender v0.2.2 after processing SPLiT-seq samples with STAR solo. Specific to the technology I have 3x8bp barcodes separated by underscore so it is 26 characters in total and I just realised that Cellbender outputs 20bp barcodes so my barcodes comes out as truncated. This creates problem because now there are many duplications in cell barcodes.

Thanks in advance!

@sjfleming
Copy link
Member

Ah! We don't want that... I have to admit I am not quite sure why that is happening.

I will try to write myself up a little test case and see if I can reproduce this. Are you using an .h5ad AnnData format file as input for cellbender?

@ayyildizd
Copy link
Author

I use the whole folder coming out from STAR solo. The input folder includes matrix.mtx, genes.tsv and barcodes.tsv.

@sjfleming
Copy link
Member

sjfleming commented Apr 24, 2023

Okay thanks for responding.

Whoops, there it is!

delimiter="\t", skip_header=0, dtype='<U20')

So I was going off the spec from CellRanger (since that is technically the format CellBender says it supports). But I'd definitely like to make it work for STAR solo format as well.

For now, one workaround you could use would be to open your dataset with anndata and save a .h5ad file to use as input for cellbender. I think this should work, since I do not believe there is any restriction on the size of barcodes for .h5ad format inputs.

Here's some pseudocode to do that

$ pip install scanpy
$ python
>>> import scanpy as sc
>>> adata = sc.read_10x_mtx("your_directory_with_the_mtx_file")  # hopefully that works
>>> adata.write("your_file.h5ad")

(then you can use --input your_file.h5ad when running cellbender)

I think in a future update I can probably change that line to dtype=str, hopefully without any problems. That should take care of any barcode size.

@sjfleming
Copy link
Member

Thank you for reporting this!

@sjfleming sjfleming self-assigned this Apr 24, 2023
@sjfleming sjfleming added bug Something isn't working enhancement New feature or improvement and removed bug Something isn't working labels Apr 24, 2023
@sjfleming sjfleming added this to the v0.3.0 milestone Apr 24, 2023
@sjfleming
Copy link
Member

Closed by #238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement
Projects
None yet
Development

No branches or pull requests

2 participants