Skip to content

Commit

Permalink
Updating metric parsing from the CLI
Browse files Browse the repository at this point in the history
  • Loading branch information
linsalrob committed Aug 24, 2020
1 parent 6bea62f commit 25fa95f
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 4 deletions.
5 changes: 3 additions & 2 deletions PhiSpyModules/classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,9 @@ def call_randomforest(**kwargs):
kwargs['metrics'].append('phmms')

if len(kwargs['metrics']) < len(all_metrics):
log_and_message(f"Using the following metric(s): {', '.join(sorted(kwargs['metrics']))}.", c='GREEN', stderr=True, quiet=kwargs['quiet'])
skip_metrics = [all_metrics.index(x) for x in set(all_metrics) - set(kwargs['metrics'])]
used = b=set(kwargs['metrics']) & set(all_metrics)
log_and_message(f"Using the following metric(s): {used}.", c='GREEN', stderr=True, quiet=kwargs['quiet'])
skip_metrics = [all_metrics.index(x) for x in set(all_metrics) - used]
train_data = np.delete(train_data, skip_metrics, 1)
test_data = np.delete(test_data, skip_metrics, 1)
else:
Expand Down
7 changes: 6 additions & 1 deletion PhiSpyModules/helper_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ def get_args():
parser.add_argument('--phage_genes', default=1, type=int,
help='The minimum number of genes that must be identified as belonging to a phage for the ' +
'region to be included. The default is %(default)d or more genes.')
parser.add_argument('--metrics', nargs='+', type=str, default=['orf_length_med', 'shannon_slope', 'at_skew', 'gc_skew', 'max_direction'],
parser.add_argument('--metrics', nargs='+', type=str, action="extend",
help='The set of metrics to consider during classification. If not set, all metrics (orf_length_med, shannon_slope, at_skew, gc_skew, max_direction) will be considered.')
parser.add_argument('-r', '--randomforest_trees', default=500, type=int,
help='Number of trees generated by Random Forest classifier. [Default: %(default)d]')
Expand Down Expand Up @@ -148,5 +148,10 @@ def get_args():
else:
args.output_dir=""


# check whether any metrics were provided
if not args.metrics:
args.metrics = ['orf_length_med', 'shannon_slope', 'at_skew', 'gc_skew', 'max_direction']

args.logger = create_logger(args)
return args
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,22 @@ cat vog/* > VOGs.hmms
hmmpress VOGs.hmms
```

### Metrics

We use several different metrics to predict regions that are prophages, and there are some optional metrics you can add. The default set of metrics are:

- `orf_length_med`: median ORF length
- `shannon_slope`: the slope of Shannon's diversity of _k_-mers across the window under consideration. You can also expand this with the `--expand_slope` option.
- `at_skew`: the normalized AT skew across the window under consideration
- `gc_skew`: the normalized GC skew across the window under consideration
- `max_direction`: The maximum number of genes in the same direction

You can also add

- `phmms`: The [phmm](#HMM-Searches) search results
- `phage_genes`: The number of genes that must be annotated as phage in the region
- `nonprophage_genegaps` : The maximum number of non-phage genes between two phage-like regions that will enable them to be merged

# Help

For the help menu use the `-h` option:
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
4.1.17
4.1.18

0 comments on commit 25fa95f

Please sign in to comment.