Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

length of the barcode #150

Closed
xul0621 opened this issue Aug 26, 2022 · 5 comments
Closed

length of the barcode #150

xul0621 opened this issue Aug 26, 2022 · 5 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@xul0621
Copy link

xul0621 commented Aug 26, 2022

Hi,

I am running CellBender on my snRNA-seq data. According to the sequencing company, each droplet have a barcode with 27 letters. After running CellBender I found only the first 20 letters were stayed which leads to the uniqueness of the barcodes. I checked the parameters, there seems no parameter was used to adjusted this.

I have no idea on how to solve this problem. Could you give me any suggestions?

Thank you in advance!

@sjfleming
Copy link
Member

Hi @xul0621 , sorry for the delayed response.

What is the file format for your dataset? CellBender can load several datatypes, and once I know which file format you used, then maybe I can look and see what happened. Off the top of my head, it doesn't seem like CellBender should be truncating the barcodes. It's supposed to leave them as they are, and not have any hard-coded requirements about length.

@xul0621
Copy link
Author

xul0621 commented Sep 22, 2022

Thanks for your reply.
I found that if I use the folder path, which contain the barcodes.tsv, genes.tsv and matrix.mtx as the input. The output of the CellBender (the h5 file) have only 20 lettersfor the barcode. In this case, I found that the format of the CellRanger is v2 in the log file.
Therefore, I tried to use the h5 format as the input. This time, the output barcode have 27 letters. I test the version of the h5 input, no matter v2 or v3 will not affect the CellBender results.
I guess, the problem will be the reading process in the algrothem from the folder path.
Thanks again for you attention.

Hi @xul0621 , sorry for the delayed response.

What is the file format for your dataset? CellBender can load several datatypes, and once I know which file format you used, then maybe I can look and see what happened. Off the top of my head, it doesn't seem like CellBender should be truncating the barcodes. It's supposed to leave them as they are, and not have any hard-coded requirements about length.

@sjfleming
Copy link
Member

Hi @xul0621, thank you very much for letting me know that. That is very helpful information. I will be sure to take another look at the data loader for the folder path, and I'll modify it so that it doesn't have any hard-coded expectations about barcode length.

And thanks for posting the fix about using the h5 file!

@sjfleming sjfleming self-assigned this Sep 22, 2022
@sjfleming sjfleming added the bug Something isn't working label Sep 22, 2022
@sjfleming sjfleming added this to the v0.3.0 milestone Sep 22, 2022
@sjfleming
Copy link
Member

Note to self: yes, here it is, the dtype is the problem... I may have copied this from 10x. dtype=str will probably fix this.

barcodes = np.genfromtxt(fname=barcode_file,
delimiter="\t", skip_header=0, dtype='<U20')

@sjfleming sjfleming mentioned this issue Mar 28, 2023
@sjfleming sjfleming mentioned this issue Aug 6, 2023
@sjfleming
Copy link
Member

Closed by #238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants