-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sequence-level annotations #113
Comments
Related issue for refget is still open (samtools/hts-specs#626), but conversation with @andrewyatz confirmed that this will not be addressed in upcoming RefGet v2 release, and it is not clear if there are plans for a RefGet v3 in the near term. |
Question for me is if seqcol would solve the issue for you or not. If not then we need to consider a next step. |
Based on a discussion with @andreasprlic and @ahwagner, we have decided to shelve this project. The rationale follows. A core assumption of seqrepo is that sequences are referenced by computed identifiers and nothing else. It is impossible to preserve this feature while also making sequence identifiers aware of other properties like sequence type, topology/circularity, taxonomy, or anything else. Sequences need to remain as verbatim strings. In principle, properties could be added to the sequence alias records. For example, the alias record could track whether the sequence type, circularity, strandedness, or anything else. This raises a slew of challenge issues:
For all of these reasons, we will not be adding sequence properties to seqrepo. Instead, if consumers need to know the sequence type, circularity, or strandedness, they will have to find another source for that info. |
It would be useful for supporting downstream methods (e.g. circular sequence support #70) to store some basic characteristics about a sequence at the sequence level. This MAY be accomplished by adding these annotations to the FASTA key fields.
I think we would minimally like to have:
and in the event it is nucleic acid:
To accomplish this @ccaitlingo and I discussed extending the
store
andfetch
methods of FastaDir to add these annotations to FASTA keys, in the following format:>{digest}|{aa / na}|{linear / circular}|{single / double}
or a compressed version of the above (i.e. bitflags). Making this issue for discussion and progress.
The text was updated successfully, but these errors were encountered: