Skip to content
This repository has been archived by the owner on Sep 20, 2024. It is now read-only.

Meaning of ''sufficient" in EV tag (Documentation). #30

Open
LeeBergstrand opened this issue Oct 5, 2017 · 10 comments
Open

Meaning of ''sufficient" in EV tag (Documentation). #30

LeeBergstrand opened this issue Oct 5, 2017 · 10 comments

Comments

@LeeBergstrand
Copy link

LeeBergstrand commented Oct 5, 2017

Given the line EV IPR010479; PF06393; sufficient;. Does the word sufficient mean that piece of evidence (given that the step has multiple pieces of evidence) is sufficient on its own to prove that the step exists?

@LornaMGnify
Copy link
Contributor

yes that is correct

@LeeBergstrand
Copy link
Author

@happy-lorna 👍

@LeeBergstrand
Copy link
Author

LeeBergstrand commented Dec 11, 2018

@happy-lorna @rdfinn

Given the line EV IPR010479; PF06393; sufficient;. Does the word sufficient mean that piece of evidence (given that the step has multiple pieces of evidence) is sufficient on its own to prove that the step exists?

From GenProp GenProp0877

SN  1
ID  Flagellar motor stator protein MotA
DN  Flagellar motor stator protein MotA
RQ  1
EV  IPR002898; PF01618;
EV  IPR022522; TIGR03818;

Do you need both EVs for a yes assignment?

From GenProp GenProp0885

SN  1
ID  Flagellar rod assembly protein/muramidase FlgJ
DN  Flagellar rod assembly protein/muramidase FlgJ
RQ  1
EV  IPR013377; TIGR02541; sufficient;
EV  IPR012823; PF02050; sufficient;

Do you need only one EV for a yes assignment?

If you have a hit for PF01618 but not for TIGR03818 in step one of GenProp0877 should this step be assigned 'NO' since both evidences are required? If you have a hit for TIGR02541 but not PF02050 for step one of GenProp0885 should this step be assigned 'YES' since only one evidence is required?

I have a dataset where we have a hit for PF01618 but not for TIGR03818 in step one of GenProp0877. However, it appears that assign_genome_properties.pl is assigning step one a result of 'YES' even though TIGR03818 is not found in the InterProScanResults.tsv.

PROPERTY: GenProp0877
Flagellar motor stator complex
.	STEP NUMBER: 1
.	STEP NAME: Flagellar motor stator protein MotA
.	.	required
.	.	INTERPRO: IPR002898; PF01618
.	.	INTERPRO: IPR022522; TIGR03818
.	STEP RESULT: yes
.	STEP NUMBER: 2
.	STEP NAME: Flagellar motor stator protein MotB
.	.	required
.	.	INTERPRO: IPR006665; PF00691
.	STEP RESULT: yes
RESULT: YES

Does your Perl code ignore the use of sufficient?

@LeeBergstrand
Copy link
Author

LeeBergstrand commented Dec 11, 2018

@SilasK This is related to Micromeda/pygenprop#32. This may explain some of the discrepancies between assignments by Pygenprop and assign_genome_properties.pl.

@LornaMGnify
Copy link
Contributor

Hi Lee,
Your understanding of the use of the "sufficient" flag is correct. However, I see that you are finding discrepancies in your output. Can you please send me the particular protein that you mention as an example so I can investigate a little bit?
Thanks!
Lorna.

@LeeBergstrand
Copy link
Author

@happy-lorna My apologies I was off the grid for Christmas.

The proteins can be found at the very bottom of the Gist below (missassigned_proteins.faa). I also included their InterProScan annotations and proteins, InterProScan results and predicted genome properties of the organism they are from.
https://gist.github.com/LeeBergstrand/3bce6c9b4cbf55e7eb9f7ef8c235c9ab

@LeeBergstrand
Copy link
Author

@happy-lorna Can you reopen this issue so other people can see it?

@LeeBergstrand
Copy link
Author

I also found the following in GenomeProperties.pm.

if($evObj->gp){
        if(defined($self->get_defs->{ $evObj->gp })){
          # For properties a PARTIAL or YES result is considered success           
          if( $self->get_defs->{ $evObj->gp }->result eq 'YES' or 
              $self->get_defs->{ $evObj->gp }->result eq 'PARTIAL' ){
              $succeed++;
           }elsif($self->get_defs->{ $evObj->gp }->result eq 'UNTESTED'){
              $step->evaluated(0);  
#Todo - need to check this bit. Some times a step can have two evidences, so need to check this is okay.               
           }
        }
      }elsif($evObj->interpro){
        #Need to annotated the sequences  
        if(!$self->annotated){  
          $self->annotate_sequences
        }
        #See if the accession has been found  
        if($self->get_family( $evObj->accession ) ){
          $succeed++;
          last EV;
        }
      }else{
        die "unknown evidence object type\n";
      }

Line Link:

#Todo - need to check this bit. Some times a step can have two evidences, so need to check this is okay.

@LornaMGnify LornaMGnify reopened this Jan 9, 2019
@LornaMGnify
Copy link
Contributor

Hi Lee, There does seem to be a problem with how the code is assessing these "non-sufficient" evidences. I have asked @rdfinn to investigate (specifically the comment line you highlight which looks likely to be indicating the issue), but he is pretty busy just now so wont get to it until the middle of next week. I will follow up here after that.
As an aside though, while investigating this I was having a closer look at GenProp0877, and actually the two evidences listed for the MotA step should be flagged as "sufficient" anyway. I will update this in the next release.

@rdfinn
Copy link
Contributor

rdfinn commented Nov 21, 2019

Hi Lee,

Sorry this has been languishing on the todo pile. Thank you for bringing to our attention and this is now fixed in the code. We have some other loose ends to finish up, but should then be able to push a release by the end of the year or early Jan.

Rob

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants