Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notating Localization Ambiguity in ProForma #30

Open
acesnik opened this issue Apr 3, 2018 · 1 comment
Open

Notating Localization Ambiguity in ProForma #30

acesnik opened this issue Apr 3, 2018 · 1 comment

Comments

@acesnik
Copy link
Contributor

acesnik commented Apr 3, 2018

Overview

We should build a way to specify ambiguity of localization in ProForma:

  1. Modifications localized to one of several amino acid options (e.g., a phosphorylation on a T, S, or Y in a proteoform)
  2. Regions of ambiguity, such as an unidentified mass on a fragment
  3. Ambiguous localization along a whole proteoform sequence (see also Global modifications (modifications for a specific AA over the whole sequence) #21)

Proposal 1: Four new keys

Add four new keys to specify ambiguity:

  1. #, noting one of several sites that may be assigned a single modification
  2. ->, noting the left boundary of a range of the sequence to which a modification may be localized
  3. <-, noting the right boundary of such a range
  4. <->, noting a modification has ambiguous localization along a whole proteoform sequence, used before the first amino acid of the sequence

The value of the key-value-pair is a unique string grouping the ambiguous localization sites

Examples:

  1. PROT[Phospho|#:eg]EOS[#:eg]FORMS[#:eg]
  • This sequence has a phosphorylation with ambiguous localization on either T4 or S12.
  • Note S7 is excluded from this group, e.g., by an identified internal fragment.
  1. PROT[mass:19|->:A]EOSFORMS[<-:A]
  • This sequence has a modification with ambiguous localization across a range.
  • Note S7 is included in this group.
  • The A values of the key-value pair groups the tags. This allows overlapping regions to be disambiguated.
  1. [mass:19|Phospho|<->:]PROTEOSFORMS
  • Some number of modifications completely unlocalized, e.g., by MS1 only.
  • The value of descriptors with the key "<->" can be any string, since the groupings are not important. (This colon is kind of ugly and addressed in proposal 3).

Proposal 2: Special prefixes and suffixes

This proposal places more emphasis on human readability for annotating localization ambiguity and less emphasis on continuing the key:value structure from the first version of ProForma.

Add four special strings to group ambiguity. These are followed or preceded by a unique string grouping the ambiguous localization sites.

  1. "#" as a prefix
  2. "->" as a suffix
  3. "<-" as a prefix
  4. "<->" alone

Examples:

  1. PROT[Phospho|#eg]EOS[#eg]FORMS[#eg]
  2. PROT[mass:19|A->]EOSFORMS[<-A]
  3. [mass:19|Phospho|<->]PROTEOSFORMS

Note:
There are currently no Unimod entries that contain these special prefixes, suffixes, or standalone strings, but if one were introduced, it would cause a collision.

Proposal 3: New keys and one special string

This proposal is a compromise of the two former proposals, taking the key:value pair continuation from the first version of the proposal, but using the special string for annotating global modifications.

Add three new keys to specify ambiguity:

  1. #, noting one of several sites that may be assigned a single modification
  2. ->, noting the left boundary of a range
  3. <-, noting the right boundary of a range

Add one special string to specify unlocalized modifications:
4. <->, noting a modification has ambiguous localization along a whole proteoform sequence, used before the first amino acid of the sequence

The value of the key-value-pair is a unique string grouping the ambiguous localization sites

Examples:

  1. PROT[Phospho|#:eg]EOS[#:eg]FORMS[#:eg]
  2. PROT[mass:19|->:A]EOSFORMS[<-:A]
  3. [mass:19|Phospho|<->]PROTEOSFORMS

Example 3 differs from Proposal 1 by dropping the colon character.

@acesnik
Copy link
Contributor Author

acesnik commented Apr 4, 2018

Maybe that was a bit much to start discussion. Here are the main questions:

  1. Do you like using "#" and the arrows to specify ambiguity?
  2. Do you want to keep building off of the key:value structure (easier to amend rules, Proposal 1 and 3), or do you want to use special prefixes and suffixes (easier human readability, Proposal 2)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant