Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi contig implementation #9

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

JFsanchezherrero
Copy link

Dear developers,

I forked the master version of IslandPath-DIMOB (commit: 34aad6c), then merge the gff branch commit (c24711b) and add additional features.

I implemented multi-contig analysis, following your criteria and using the same functions. I only took into account that the dinuc island and dimob island generated was within the same contig.

In order to implement I had to do some modifications:

Change :

  • Parsing of the ptt table and always return as key the start_end coordinates and the sequence id.

Fix some bugs:

Change input:
I increased the input that were set as fixed variables in the code.
For example, users can supply the minimun number of genes under a dinuc bias (Default 8) or the minimun GI size (Defaulta 8000).

Change output:

  • Apart from the GFF output, I think it might be useful to provide the dinuc bias analysis so a csv file is provided.
  • Also, for each dimob genomic island identified I provide the annotation and details of the proteins involved. I guess I might have solved the issue 4 (Add generation of fasta and gbk for each GI as standard result files #4)
  • Also, I provide a list of discarded regions due to length restrictions.

Add example:

  • I have added again the examples files you previously discarded in order to have some sets to rely on. I could always retrieved the same results using the version and the original code.

  • I tested the multi-contig feature behaviour with some assembly drafts I have here in genbank format and it worked for me. I can not deliver the data due to confidential issues but it might be appropiate to include some available on NCBI genomes for testing purposes. I downloaded and tested a genome of S. aureus (E.g. https://www.ncbi.nlm.nih.gov/assembly/GCA_900457655.1) containing several contig/scaffolds in genbank format (gbf) that I have also included in the example folder.

I hope you find useful this new implementation. If you think it might be appropiate to pull the request do it, if not, that is fine. I would be using this new implementation within my own pipeline, citing you accordingly.

Thank you very much
Please contact me if any further details are necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants