Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi contig implementation #9

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
94edd28
Remove test data
Jan 11, 2019
c81c1c6
Add GFF output
Jan 17, 2019
f483e40
Add input sequence and strand to GFF output
Jan 31, 2019
c8d1fe0
Fix bioperl not finding id() in embl files
Jan 31, 2019
7aa93b9
Add fix to ignore features with joins containing start > end that bre…
glwinsor Jun 5, 2019
c24711b
Merge remote-tracking branch 'origin/geoff' into gff
Jun 6, 2019
154f595
Suppress experimental smartmatch
JFsanchezherrero Jun 25, 2019
52c7176
Let several contigs (def read_and_convert)
JFsanchezherrero Jun 25, 2019
7794abd
add csv outfile for dinuc bias
JFsanchezherrero Jun 25, 2019
7455041
Aesthetics, outfile to csv, loop bug fixed
JFsanchezherrero Jun 25, 2019
9f7a0d1
change header row
JFsanchezherrero Jun 25, 2019
ecbd4cf
Aesthetics, header row number
JFsanchezherrero Jun 25, 2019
afbff58
Add multi-contig analysis
JFsanchezherrero Jun 25, 2019
651d2a6
remove debugging messages
JFsanchezherrero Jun 26, 2019
7497c97
Add logging messages; GI min_length as variable
JFsanchezherrero Jun 27, 2019
02ec5a3
filter GI by min length
JFsanchezherrero Jun 27, 2019
323d4c0
control for different contig sequences
JFsanchezherrero Jun 27, 2019
1b56453
key as coordiantes_sequence
JFsanchezherrero Jun 27, 2019
7fe9c70
merge with gff branch
JFsanchezherrero Jun 27, 2019
b3e8ca2
add credits
JFsanchezherrero Jun 27, 2019
af28a7e
add example embl format
JFsanchezherrero Jun 27, 2019
b008436
add credits
JFsanchezherrero Jun 27, 2019
9049154
Update README.md
JFsanchezherrero Jun 27, 2019
5ce24bf
Update README.md
JFsanchezherrero Jun 27, 2019
b4b2a03
Add multi-contig example
JFsanchezherrero Jun 27, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
153 changes: 116 additions & 37 deletions Dimob.pl
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,33 @@ =head1 DESCRIPTION

=head1 SYNOPSIS

./Dimob.pl <genome.gbk> <outputfile.txt>
Example:
./Dimob.pl example/NC_003210.gbk NC_003210_GIs.txt
perl Dimob.pl <genome.gbk> <output_name> [cutoff_dinuc_bias] [min_length]

Default values:
cutoff_dinuc_bias = 8
min_length = 8000

Example:
perl Dimob.pl example/NC_003210.gbk NC_003210_GIs
perl Dimob.pl example/NC_003210.gbk NC_003210_GIs 6 10000
perl Dimob.pl example/NC_000913.embl NC_000913_GIs 6 10000

=head1 AUTHORS

Claire Bertelli
Claire Bertelli [Original author]
Email: [email protected]
Brinkman Laboratory
Simon Fraser University

Jose F. Sanchez-Herrero [Developer of this fork]
Email: [email protected]
Bioinformatics Facility Unit, Institut German Trias i Pujol (IGTP)
Badalona, Barcelona, Spain


=head1 LAST MAINTAINED

December 16th, 2016
June 27th, 2019

=cut

Expand All @@ -45,6 +58,17 @@ =head1 LAST MAINTAINED
use GenomeUtils;
use Dimob;

##
## New implementations
##
## Add multicontig function
## Fix smartmacth experimental warning message
## Fix dinuc bias loop iteration bug
## Use GI min_length as a variable
## Use cutoff_genes_dinuc as a variable
## merge with gff branch
## Output csv information for dinucleotide bias.
## Provide additional information in output GI information

MAIN: {

Expand All @@ -55,48 +79,73 @@ =head1 LAST MAINTAINED
#my $logger_conf = "$RealBin/logger.conf";

# usage help
my $usage = "Usage:\n./Dimob.pl <genome.gbk> <outputfile.txt>\nExample:\n./Dimob.pl example/NC_003210.gbk NC_003210_GIs.txt\n";
my $usage = "Usage:\nperl Dimob.pl <genome.gbk> <output_name> [cutoff_dinuc_bias] [min_length]\n";
$usage .= "\nDefault values:\n\tcutoff_dinuc_bias = 8\n\tmin_length = 8000\n\n";
$usage .= "Example:\n\tperl Dimob.pl example/NC_003210.gbk NC_003210_GIs\n";
$usage .= "\tperl Dimob.pl example/NC_003210.gbk NC_003210_GIs 6 10000\n";
$usage .= "\tperl Dimob.pl example/NC_000913.embl NC_000913_GIs 6 10000\n\n";

my ($inputfile, $outputfile) = @ARGV;
my ($inputfile, $output_name, $cutoff_dinuc_bias, $min_length) = @ARGV;

# Check that input file and output file are specified or die and print help message
unless(defined($inputfile) && defined($outputfile)){
unless(defined($inputfile) && defined($output_name)){
print $usage;
exit;
}

# Transform relative path to absolute path and
# check that input file is readable
$inputfile = File::Spec -> rel2abs($inputfile);
unless( -f $inputfile && -r $inputfile ) {
print "Error: $inputfile is not readable";
exit;
}


## min_length
if (!$min_length) { $min_length = 8000; }

# Create a dimob object
my $dimob_obj = Dimob->new(
{cfg_file => $cfname,
bindir => $RealBin,
workdir => $cwd,
MIN_GI_SIZE => 2000,
extended_ids => 1
{
cfg_file => $cfname,
bindir => $RealBin,
workdir => $cwd,
MIN_GI_SIZE => $min_length,
extended_ids => 1
}
);
# Recover the config from file, initialized during creation dimob_obj

# Recover the config from file, initialized during creation dimob_obj
my $cfg = Dimob::Config->config;
$cfg->{logger_conf} = "$RealBin/" . $cfg->{logger_conf};
$cfg->{hmmer_db} = "$RealBin/" . $cfg->{hmmer_db};
$cfg->{logger_conf} = $RealBin."/".$cfg->{logger_conf};
$cfg->{hmmer_db} = "$RealBin/".$cfg->{hmmer_db};

# Check that the logger exists and initializes it
print $cfg->{logger_conf};
#print $cfg->{logger_conf};
if($cfg->{logger_conf} && ( -r $cfg->{logger_conf})) {
Log::Log4perl::init($cfg->{logger_conf});
$logger = Log::Log4perl->get_logger;
$logger->debug("Logging initialized");
#$logger->debug("Logging initialized");
}

$logger->debug("IslandPath-DIMOB initialized");

## min_length
if (!$min_length) {
$min_length = 8000;
$logger->debug("Use default min_length: 8000 bp");
} else {
$logger->debug("Use min_length: $min_length");
}

# Transform relative path to absolute path and
# check that input file is readable
$inputfile = File::Spec -> rel2abs($inputfile);
unless( -f $inputfile && -r $inputfile ) {
print "Error: $inputfile is not readable";
exit;
}

## check if $cutoff_genes provided
if (!$cutoff_dinuc_bias) {
$cutoff_dinuc_bias = 8;
$logger->debug("Use cutoff_dinuc_bias default: 8");
} else {
$logger->debug("Use cutoff_dinuc_bias provided: ".$cutoff_dinuc_bias);
}

# Create a tmp directory to store intermediate results, copy the input file to the tmp
$logger->info("Creating temp directory with needed files");
my($filename, $dirs, $suffix) = fileparse($inputfile, qr/\.[^.]+$/);
Expand Down Expand Up @@ -132,25 +181,55 @@ =head1 LAST MAINTAINED
# Runs IslandPath-DIMOB on the genome files

$logger->info("Running IslandPath-DIMOB");
my @islands = $dimob_obj->run_dimob($inputfile);
my @islands = $dimob_obj->run_dimob($inputfile, $output_name, $cutoff_dinuc_bias);

$logger->info("Printing results");

my $outputfile = $output_name.".txt";
## txt output
open OUT_TXT, '>', $outputfile or die "Cannot open $outputfile: $!";
print OUT_TXT "##GI\tseq\tstart\tend\tstrand\n";

## gff output
my $gff_file = $output_name.".gff3";
open GFF, '>', $gff_file or die "Cannot open $gff_file: $!";
print GFF "##gff-version 3\n";

## discarded regions
my $discard_file = $output_name."_discard.txt";
open DISCARD, '>', $discard_file or die "Cannot open $discard_file: $!";
print DISCARD "##seq\tstart\tend\tlength\tstrand\n";

## loop through islands and print
my $i = 1;
open my $fhgd, '>', $outputfile or die "Cannot open output.txt: $!";
foreach my $island (@islands) {
my $start = $island->[0];
my $end = $island->[1];
print $fhgd "GI_$i\t$start\t$end\n";
$i++;
my $seq = $island->[0];
my $start = $island->[1];
my $end = $island->[2];
my $strand = $island->[3];

## discard if smaller than min length set
my $diff = $end - $start;
if ($diff < $min_length) {
print DISCARD "$seq\t$start\t$end\t$diff\t$strand\n";
} else {

#$logger->info("Warning: txt output is now depreciated. Support has been added to output GFF3 formatted documents. Use (any) other extension to enable GFF output. See: https://github.com/brinkmanlab/islandpath/issues/7");
print OUT_TXT "GI_$i\t$seq\t$start\t$end\t$strand\n";
print GFF "$seq\tislandpath\tgenomic_island\t$start\t$end\t.\t$strand\t.\tID=$seq\_gi$i\n";
$i++;
}
}
close $fhgd;

## close filehandles
close (GFF); close (OUT_TXT); close (DISCARD);

## finish
$logger->info("Removing tmp files");
unless(unlink glob "$inputfile.*") {
unless(unlink glob "$inputfile.*") {
$logger->error("Can't remove $inputfile: $!");
}
unless(rmdir $tmp_path) {
$logger->error("Can't remove $tmp_path: $!");
}
}

57 changes: 33 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,20 @@ IslandPath-DIMOB is a standalone software to predict genomic islands in bacteria
Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probable horizontal origin.
GIs are disproportionately associated with microbial adaptations of medical or environmental interest.

The latest IslandPath-DIMOB version is integrated in [IslandViewer 4](http://www.pathogenomics.sfu.ca/islandviewer/browse/), the leading integrated web interface for genomic island prediction.
This version here is a modification of the original version (https://github.com/brinkmanlab/islandpath) that allows IslandPath-DIMOB to process assembly draft genomes in multiple contigs and not using a reference genome.

## Install

A pre-built Docker image is available in the [brinkmanlab docker hub](https://hub.docker.com/r/brinkmanlab/islandpath/). Using this pre-installed version of IslandPath-DIMOB ensures the software runs according to expectations.
Although the original version has multiple instalation options: docker, github releases... for this particular version we would only rely on github clone.

Users wishing to install locally IslandPath-DIMOB can download a [release](http://github.com/brinkmanlab/islandpath/releases/) and install required perl libraries and HMMER listed below:
Clone the latest code from github:

Alternatively, you can also clone the latest code from github:

git clone https://github.com/brinkmanlab/islandpath

Please note that IslandPath-DIMOB predictions should only take a couple of minutes per bacterial genome. It was recently reported that IslandPath-DIMOB was extremely slow on Mac OS X with a conda installation of perl libraries. While we investigate the reason for this issue, we recommend using the Docker image.
git clone https://github.com/JFsanchezherrero/islandpath


**_Dependencies_**

No additional dependencies were added to this new implementation. IslandPath-DIMOB remains with the same original dependencies.

1. Though IslandPath-DIMOB should work with any OS, it has only been tested with linux.

2. Perl version 5.18.x or higher
Expand All @@ -42,14 +39,19 @@ HMMER can be obtained from http://hmmer.org/

## Run

IslandPath-DIMOB v1.0.0 takes as input an annotated complete genome as a genbank (.gbk) or an embl (.embl) file.
IslandPath-DIMOB v1.0.1_b takes as input an annotated complete/draft genome as a genbank (.gbk) or an embl (.embl) file.

# gbk file
./Dimob.pl example/NC_003210.gbk NC_003210_GIs.txt
# embl file
./Dimob.pl example/NC_000913.embl NC_000913_GIs.txt
perl Dimob.pl <genome.gbk> <output_name> [cutoff_dinuc_bias] [min_length]

Default values:
cutoff_dinuc_bias = 8
min_length = 8000

Example:
perl Dimob.pl example/NC_003210.gbk NC_003210_GIs
perl Dimob.pl example/NC_003210.gbk NC_003210_GIs 6 10000
perl Dimob.pl example/NC_000913.embl NC_000913_GIs 6 10000

## Citation

[Bertelli and Brinkman, 2018](https://doi.org/10.1093/bioinformatics/bty095)
Expand All @@ -58,26 +60,33 @@ IslandPath-DIMOB v1.0.0 takes as input an annotated complete genome as a genbank

## Questions? Comments? Bugs?

Email [email protected] (contact person: Claire Bertelli) and we'd be happy to help.

If you find a bug, please report it to [email protected] along with the
following information:

* version of Perl (output of 'perl -V' is best)
* version of IslandPath-DIMOB
* operating system type and version
* exact text of error message or description of problem
Email [email protected] (contact person: Claire Bertelli) for the original version.

If we don't have access to a system similar to yours, you may be asked to insert some debugging lines and report back on the results. The more help and information you can provide, the better!
Email [email protected] (contact person: Jose F. Sanchez-Herrero) or via github issue for the modification version.


## Copyright and License

IslandPath-DIMOB is distributed under the GNU General Public License. See also the LICENSE file included with this package.

Give credit to the original version of this software available at https://github.com/brinkmanlab/islandpath


## Versions - New features

### 27/06/2019 - IslandPath-DIMOB v1.0.1_b
Several new implementations were performed in order to add multicontig functionality.

We additionally fix some bugs and warning messages:
- Fix smartmacth experimental warning message
- Fix dinuc bias loop iteration bug https://github.com/brinkmanlab/islandpath/issues/8#issue-459228372

We increase the input and output options
- Use GI min_length as a variable
- Use cutoff_genes_dinuc as a variable
- Output csv information for dinucleotide bias.
- Provide additional information such as annotation of the genes within each GI

### 23/12/2016 - IslandPath-DIMOB v1.0.0
Increased recall and precision in the prediction of genomic islands based on the presence of dinucleotide bias and mobility genes. Standardization of input file types, and automatic generation of the other file types required by IslandPath-DIMOB.
Input: gbk or embl file
Expand Down
Loading