Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can pygenprop replace assign_genome_properties.pl ? #32

Closed
SilasK opened this issue Dec 5, 2018 · 9 comments
Closed

Can pygenprop replace assign_genome_properties.pl ? #32

SilasK opened this issue Dec 5, 2018 · 9 comments

Comments

@SilasK
Copy link

SilasK commented Dec 5, 2018

If I understand your code correctly you can parse the long format output of assign_genome_properties.pl from the genome properties, but there is no script to infer the genome properties from the output of interposcan directly.

@SilasK SilasK changed the title Can pygenprop replace assign_genomeproperties.pl ? Can pygenprop replace assign_genome_properties.pl ? Dec 5, 2018
@LeeBergstrand
Copy link
Collaborator

Hi @SilasK,

That is correct. The original plan was to simply parse the long format output from assign_genome_properties.pl and this is what the code can do currently.

However, I noticed that the long form file format only contained results for lower level genome properties such as Systems and Pathways but not for all types of genome properties (e.g. Catagories) in the tree. See the diagram below:

genome properties types

Since my visualization software uses all levels of genome properties I had to write code to do my own assignments for higher level properties.

My assignment code can be found in this file: https://github.com/Micromeda/pygenprop/blob/master/pygenprop/results.py

Specifically, the following functions:

  • assign_results_to_property_and_children()
  • assign_step_result()
  • assign_property_result_from_required_steps()
  • assign_result_from_child_assignment_results()

These functions could potentially be used to assign genome property results InterProScan output.

@LeeBergstrand
Copy link
Collaborator

LeeBergstrand commented Dec 9, 2018

To make assignments right from InterProScan results would need to do the following. Note: this is based on my basic understanding of the Perl code in assign_genome_properties.pl. I still need to reverse engineer it further to have a better understanding of it.

  • 1. Take the InterProScan.tsv extract the column with InterPro identifiers.
  • 2. Reduce this list of InterPro identifiers to a unique set.
  • 3. Grab leaf genome properties and iterate through each of their steps and check if the step's evidence identifiers are in the list of InterProScan identifiers (from your Genome Properties Tree object model).
  • 4. Assign steps as yes or no accordingly (filling out step assignments dict).
  • 5. Assign leaf level genome properties (could use existing functions above).
  • 6. Assign higher genome properties for leaf properties (could use existing functions above, would be best to just create the output from https://github.com/Micromeda/pygenprop/blob/master/pygenprop/assignment_file_parser.py and pass this in).
  • 7. Check these assignments match those from assign_genome_properties.pl
  • 8. Write some unit tests.

@LeeBergstrand
Copy link
Collaborator

@LeeBergstrand
Copy link
Collaborator

Looks like there are some anomalies. I am investigating.

@SilasK
Copy link
Author

SilasK commented Dec 11, 2018 via email

@LeeBergstrand
Copy link
Collaborator

@LeeBergstrand
Copy link
Collaborator

@SilasK Completed in #33

@LeeBergstrand
Copy link
Collaborator

Still on the develop branch. I'm going to be working on documentation.

@LeeBergstrand
Copy link
Collaborator

Summary can be found here.

https://github.com/Micromeda/pygenprop/blob/d284f2bb26adfab2035f2eefd1c7d7f5ada07c29/pygenprop/testing/compare_assignment_to_assign_properties_perl.ipynb

There are some difference, however, these are due to assign_genome_properties.pl not working correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants