Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added hitmap wrapper #636

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions tools/hitmap/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
categories:
- Proteomics
description: 'High-resolution MALDI imaging Proteomics Analysis'
homepage_url: https://github.com/MASHUOA/HiTMaP
long_description: |
Hitmap uses peptide mass fingerprint analysis and a dual scoring system to computationally assign peptide and protein annotations to high mass resolution MALDI-MSI datasets and generate customisable spatial distribution maps.

Note: consumes large amounts of memory during runtime. Run on a node with at least 64Gb memory.
name: hitmap
owner: galaxyp
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/hitmap/
297 changes: 297 additions & 0 deletions tools/hitmap/hitmap.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,297 @@

<tool id="hitmap" name="HiT-MaP" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="20.05">
<description>High-resolution Maldi-imaging Proteomics Analysis</description>

<macros>
<import>macros.xml</import>
</macros>

<edam_topics>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you like please include a bio.tools identifier. The edam ontology is not strictly needed anymore, we will get those with the bio.tools IDs. See the xref tag.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HiT-MaP doesn't have a listing in bio.tools. Should I leave this as edam topics?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, leave it! We could also add it to https://github.com/bio-tools/content, but this repo is not yet in production.

<edam_topic>topic_0121</edam_topic>
</edam_topics>

<edam_operations>
<edam_operation>operation_3630</edam_operation>
</edam_operations>

<requirements>
<container type="docker">mashuoa/hitmap@sha256:97f2f853362b56be6c2b0fb1507e0288667e7fb6ad0eda3ff1d761dd72dfaad0</container>
</requirements>

<command detect_errors="exit_code"><![CDATA[
mkdir expdata &&
cp '${file_inputs.imzml_file}' expdata/sample.imzML &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is ln enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunatenly not. Hitmap needs the file extension to be intact, but all galaxy data has extension '.dat'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But instead of using cp you can do ln -l '${file_inputs.imzml_file}' expdata/sample.imzML

That saves a copy operation. And works in 95% of all cases. In some cases, the tool does not follow symlinks ... I hope this is not the case here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm gave this a shot, but since it's a container, it was having difficulty creating the link.

I think this is an improvement which could appear in a later version of the wrapper.

HiTMaP has been very temperamental and hard to get working, so I'd rather just keep as is rather that spending time on this optimization.

cp '${file_inputs.ibd_file}' expdata/sample.ibd &&
cp '${file_inputs.fasta_db}' expdata/database.fasta &&
Rscript '${hitmap_script}' &&
python3 $__tool_directory__/hitmap_gen_html.py expdata &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single quote for the path

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean $tool_directory/hitmap_gen_html.py -> '$tool_directory/hitmap_gen_html.py'?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes :) the reason is that you might have "spaces" in your tool_directory path .. in this case the tool breaks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha oh yeah that's a great point. Can happen!
Added to the next commit :)

zip -r hitmap_output.zip expdata
]]></command>

<configfiles>
<expand macro="hitmap_config_script" />
</configfiles>

<inputs>
<section name="file_inputs" title="File Inputs" expanded="True">
<param name="imzml_file"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Galaxy datatype imzml is a composite datatype, which means it contains 2 files. You can see how this can be used with the Cardinal tools.

https://github.com/galaxyproteomics/tools-galaxyp/blob/master/tools/cardinal/macros.xml#L18

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I originally tried to do it that way, but couldn't work out how to upload two files (.imzml + .ibd) together as a composite type in the galaxy uploader.

I just went with the approach that the users upload the imzml and ibd as seperate files into their history, then select those when performing the run.

What do you think about that approach?

Also thank you again so much for looking over these submissions and helping me out. I really appreciate it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about that approach?

I think its can result in more errors from the user side, picking the wrong combination of files etc.
Do you know why this did not work for you? Can you try this on EU. I think its working on EU.

grafik

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhh I see!
Yes this is working fine, I just wasn't aware of that option. Very neat!
Added to the next commit

type="data"
format="mzml"
label="MALDI-imaging imzML file"
help="Input *.imzML file. Must have same basename as counterpart *.ibd. " />

<param name="ibd_file"
type="data"
format="binary"
label="MALDI-imaging binary ibd file"
help="Input *.ibd file. Must have same basename as *.imzML. " />

<param name="fasta_db"
type="data"
format="fasta"
label="Proteome"
help="Proteome of organism in fasta format. The proteome for a particular organism can be downloaded from https://www.uniprot.org/proteomes/. file must be fasta format. "/>
</section>


<section name="peptide_identification" title="Peptide Identification" expanded="True">
<param name="digestion_site"
type="select"
label="Digestion site"
help="Digestion enzyme used in sample preparation">
<option value="arg-c proteinase">arg-c proteinase</option>
<option value="asp-n endopeptidase">asp-n endopeptidase</option>
<option value="bnps-skatole">bnps-skatole</option>
<option value="caspase1">caspase1</option>
<option value="caspase2">caspase2</option>
<option value="caspase3">caspase3</option>
<option value="caspase4">caspase4</option>
<option value="caspase5">caspase5</option>
<option value="caspase6">caspase6</option>
<option value="caspase7">caspase7</option>
<option value="caspase8">caspase8</option>
<option value="caspase9">caspase9</option>
<option value="caspase10">caspase10</option>
<option value="chymotrypsin-high">chymotrypsin-high</option>
<option value="chymotrypsin-low">chymotrypsin-low</option>
<option value="clostripain">clostripain</option>
<option value="cnbr">cnbr</option>
<option value="enterokinase">enterokinase</option>
<option value="factor xa">factor xa</option>
<option value="formic acid">formic acid</option>
<option value="glutamyl endopeptidase">glutamyl endopeptidase</option>
<option value="granzyme-b">granzyme-b</option>
<option value="hydroxylamine">hydroxylamine</option>
<option value="iodosobenzoic acid">iodosobenzoic acid</option>
<option value="lysc">lysc</option>
<option value="lysn">lysn</option>
<option value="neutrophil elastase">neutrophil elastase</option>
<option value="nctb">nctb</option>
<option value="pepsin">pepsin</option>
<option value="pepsin1.3">pepsin1.3</option>
<option value="proline endopeptidase">proline endopeptidase</option>
<option value="proteinase k">proteinase k</option>
<option value="staphylococcal peptidase i">staphylococcal peptidase i</option>
<option value="thermolysin">thermolysin</option>
<option value="thrombin">thrombin</option>
<option value="trypsin" selected="True">trypsin</option>
</param>

<param name="missed_cleavages"
type="integer"
value="1"
default_value="1"
label="missed cleavages"
help="max expected number of missed peptide cleavages during digestion" />

<param name="ppm"
type="integer"
value="5"
default_value="5"
label="precursor ion m/z tolerance"
help="Error in parts per million (ppm) for matching the monoisotopic peak of a species with the observed peak in the experimental spectrum"/>

<param name="spectra_segments_per_file"
type="integer"
value=""
default_value="4"
label="Spectra segments"
help="(Integer) Expected number of different spectra segments. Will usually be equal to the number of different tissue types in the sample." />

<param name="fdr_cutoff"
type="float"
min="0"
max="1"
value="0.05"
default_value="0.05"
label="FDR cutoff"
help="FDR threshold for peptide matching and protein annotation. FDR of 0.05 promises no greater than 5% of identified peptides are fasle-positives. Uses ratio of peptide score in real vs decoy database."/>
</section>

<section name="modifications" title="Modifications" expanded="False">
<param name="fixed"
type="text"
value=""
default_value=""
label="Fixed modification type"
help='Comma-separated list of fixed modifications using unimod standard. Either the Interim name or Accession # can be used. List of modifications can be found at http://www.unimod.org/. example: "Phospho" or "21" will both add the Phospho- modification' />

<param name="fixmod_position"
type="select"
label="Fixed modification position"
help="Position of the fixed modification">
<option value="None" selected="true">None</option>
<option value="Anywhere">Anywhere</option>
<option value="Any N-term">Any N-term</option>
<option value="Any C-term">Any C-term</option>
<option value="Protein N-term">Protein N-term</option>
<option value="Protein C-term">Protein C-term</option>
</param>

<param name="variable"
type="text"
value=""
default_value=""
label="Variable modification type"
help='Comma-separated list of variable modifications using unimod standard. Either the Interim name or Accession # can be used. List of modifications can be found at http://www.unimod.org/. example: "Phospho" or "21" will both add the Phospho- modification' />

<param name="varmod_position"
type="select"
label="Variable modification position"
help="Position of the variable modification">
<option value="None" selected="true">None</option>
<option value="Anywhere">Anywhere</option>
<option value="Any N-term">Any N-term</option>
<option value="Any C-term">Any C-term</option>
<option value="Protein N-term">Protein N-term</option>
<option value="Protein C-term">Protein C-term</option>
</param>
</section>

<section name="cluster_ions" title="Proteins of interest" expanded="false">
<param name="proteins_of_interest"
type="text"
value=""
default_value=""
label="Cluster ion imaging"
help="Comma separated list of proteins of interest. For each protein, an image map will be created for the protein and its individual peptides. This image shows the distribution and concentration of the protein and its peptides across the sample. List is case-insensitive. Example: Crystallin, phakinin, filensin, Actin" />
</section>

<section name="advanced" title="Advanced" expanded="False">
<param name="threshold"
type="float"
min="0"
max="1"
value="0.005"
default_value="0.005"
label="Ion spectra noise threshold"
help="Baseline relative m/z intensity to consider peak. Used to exclude noise. Higher values ignore more noise, but may miss true peaks. "/>

<param name="decoy_mode"
type="select"
label="Decoy database mode"
default_value="isotope"
help="Strategy for building decoy database. Isotope is recommended.">
<option value="isotope" selected="true">isotope</option>
<option value="elements">elements</option>
<option value="adducts">adducts</option>
</param>

<param name="adducts"
type="text"
value="M+H"
default_value ="M+H"
label="adducts"
help='Quoted, comma seperated, list of naturally occuring or sample prep-induced adducts. H+ adducts are usually the most abundant, but Na+ adducts are also common depending on tissue and sample processing. Each adduct pattern added to the list greatly increases runtime and memory usage. example1: "M+H" example2: "M+H","M+Na".' />
</section>
</inputs>

<outputs>
<data name="html_report"
format="html"
from_work_dir="hitmap_report.html"
label="${tool.name} on ${on_string}: HTML report" />

<data name="out_data"
format="zip"
from_work_dir="hitmap_output.zip"
label="${tool.name} on ${on_string}: data package" />
</outputs>

<tests>
<test>
<param name="imzml_file" value="Bovin_lens.imzML" />
<param name="ibd_file" value="Bovin_lens.ibd" />
<param name="fasta_db" value="uniprot-bovin.fasta" />
<section name="peptide_identification">
<param name="digestion_site" value="trypsin" />
<param name="missed_cleavages" value="2" />
<param name="ppm" value="5" />
<param name="spectra_segments_per_file" value="4" />
<param name="fdr_cutoff" value="0.05" />
</section>
<section name="modifications">
<param name="fixed" value="" />
<param name="fixmod_position" value="None" />
<param name="variable" value="" />
<param name="varmod_position" value="None" />
</section>
<section name="cluster_ions">
<param name="proteins_of_interest" value="Crystallin, phakinin, filensin, Actin" />
</section>

<output name="html_report" ftype="html">
<assert_contents>
<has_text text='div id="segmentation" class="row"' />
<has_text text='div id="pca" class="row"' />
<has_text text='div id="scree" class="row"' />
<has_text text='div id="segment-mean-spectra" class="row"' />
<has_text text='div id="outliers" class="row"' />
</assert_contents>
</output>
</test>
</tests>

<help><![CDATA[

HiT-MaP
.......

**What it does**

An R package of High-resolution Informatics Toolbox for Maldi-imaging Proteomics

This tool is still in development and has been provided to the community for early access.

Github: https://github.com/MASHUOA/HiTMaP

Publication: https://doi.org/10.1038/s41467-021-23461-w

**Issues**

| There are known issues including parallel::mccollect and subscript out of bounds errors.
| Rerunning the job sometimes solves the issue.
| Please report errors using the bug icon and they will be raised with HiT-MaP developers.

**Input**

- MALDI imaging dataset (imzML + ibd)
- Proteome database (fasta)

(Examples of MALDI imaging datasets can be found at
https://www.ebi.ac.uk/pride/archive?keyword=imzml&sortDirection=DESC&page=0&pageSize=20)

**Output**

- HTML report (html)
- Zipped directory containing all program output files (zip)

|

]]></help>

<citations>
<citation type="doi">10.1038/s41467-021-23461-w</citation>
</citations>

</tool>
Loading
Loading