Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Human readable producing non-unique folders #63

Open
tseemann opened this issue May 4, 2018 · 3 comments
Open

Human readable producing non-unique folders #63

tseemann opened this issue May 4, 2018 · 3 comments
Assignees
Labels

Comments

@tseemann
Copy link

tseemann commented May 4, 2018

The strain column isn't unique. Might need to detect this, and appened the GCF_ number to the strain to discriminate?

/home/tseemann/tmp/B.cereus/human_readable/refseq/bacteria/Bacillus/cereus/E33L/GCF_000011625.1_ASM1162v1_genomic.fna.gz

/home/tseemann/tmp/B.cereus/human_readable/refseq/bacteria/Bacillus/cereus/E33L/GCF_000833045.1_ASM83304v1_genomic.fna.gz
@kblin kblin self-assigned this May 8, 2018
@kblin kblin added the bug label May 8, 2018
@kblin
Copy link
Owner

kblin commented May 8, 2018

Thanks for the report!

@kblin
Copy link
Owner

kblin commented Jan 3, 2020

Hi @tseemann, finally having some time to look at this. I'm beginning to feel like this works as intended ™️. If there are multiple assemblies for a strain, the strain dir will have multiple files. The way I understand your report is that this isn't what you are expecting.

From your perspective, what is the benefit of having two strain_assembly_id folders, rather than two files in a strain folder?

@tseemann
Copy link
Author

tseemann commented Mar 1, 2020

These are not different assemblies of the same thing though?

https://www.ncbi.nlm.nih.gov/assembly/GCF_000011625.1/
https://www.ncbi.nlm.nih.gov/assembly/GCF_000833045.1/

The are different biosamples.
They just happen to have the same "strain" name but this is not an enforced unique field.
Could have came originally from same freezer stock, but been passaged?
Some labs use such generic strain IDs that clashes happen all the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants