Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save paths for alts using plain variant ID (when possible) #4065

Merged
merged 2 commits into from
Sep 21, 2023

Conversation

ldenti
Copy link
Contributor

@ldenti ldenti commented Aug 31, 2023

Hi all,
I needed to store alt paths by variant ID (and not SHA1 hash). To do this without (hopefully) breaking anything, I added a new option to the CLI. I noticed that there is already a function to retrieve the variant ID (if present), so I used it. Any drawbacks? (I'm curious to know why by default it uses the make_variant_id and not the get_or_make_variant_id function)

If you think this can be useful, let me know how to improve it to make it mergeable.

Best,

@adamnovak
Copy link
Member

We can definitely add this. I think we have been always using the hashes on the theory that the variant IDs might not actually be unique across a set of files, or even worse might uniquely identify a variant but not a ref/alt designation or an allele numbering order for the variant. But if you know that's not going to happen, we ought to be able to use the stored IDs.

@adamnovak
Copy link
Member

I've pushed this over to a branch https://github.com/vgteam/vg/tree/plain-altid so I think CI will test it.

@ldenti
Copy link
Contributor Author

ldenti commented Sep 18, 2023

But if you know that's not going to happen, we ought to be able to use the stored IDs.

Well, I'm not sure of that: I always try to make them unique.. But for this reason I believe that an optional arg is the best option.. The default behavior is still hashing the variants, but if a user needs to use plain IDs, he can but he must be aware that IDs must be unique. This shouldn't break anything in vg and will place the burden on the user that doesn't use default parameters (I can add this warning to the --help).. Don't know if this is good or not btw (in terms of vg project)

Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@ldenti
Copy link
Contributor Author

ldenti commented Sep 19, 2023

I've updated the --help.

@adamnovak adamnovak merged commit 2eac75a into vgteam:master Sep 21, 2023
2 checks passed
@adamnovak
Copy link
Member

OK, I've merged this in.

But there are several other places, besides in vg construct, where we unconditionally call make_variant_id() and expect to find the resulting path in the graph. The most important is probably here, where we are making the haplotype paths for the GBWT.

So unless we also add this option to the other subcommands, or we figure out how to make them look for the matching path by ID if they can't find it by hash, then graphs made with this new option won't really work with those other operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants