Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add module openms/idmassaccuracy #6647

Merged
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
2916f3b
Setup cometadapter module
jonasscheid Jan 9, 2024
5bbf925
Merge branch 'master' into 4698-new-module-openmsthirdpartycometadapter
jonasscheid Jan 22, 2024
e2ae09d
cometadapter draft
jonasscheid Feb 4, 2024
46f304d
finalize tests
jonasscheid Sep 11, 2024
d264c69
remove defaults
jonasscheid Sep 11, 2024
7c93a66
enclosing input as channel
jonasscheid Sep 11, 2024
4ee4f0c
replace collect with map
jonasscheid Sep 11, 2024
9c79721
add channel of again
jonasscheid Sep 11, 2024
99c6b3d
fix input channles by joining them
jonasscheid Sep 12, 2024
22bef31
update correct snapshot
jonasscheid Sep 12, 2024
92c1f14
fix lint
jonasscheid Sep 12, 2024
3df19f1
fix snapshots, comet writes timestamps in output file
jonasscheid Sep 12, 2024
0003ab4
prettier
jonasscheid Sep 12, 2024
8b2fbf2
Update environment.yml
jonasscheid Sep 12, 2024
8dea6c6
Merge branch 'master' into 4698-new-module-openmsthirdpartycometadapter
jonasscheid Sep 12, 2024
897702a
strip out suffix version tag, which differs between container and conda
jonasscheid Sep 13, 2024
b5b1a3c
Merge branch 'nf-core:master' into 4698-new-module-openmsthirdpartyco…
jonasscheid Sep 13, 2024
6d7bc76
Merge branch 'master' into 6625-new-module-openms-idmassaccuracy
jonasscheid Sep 13, 2024
6d06897
move to version content check instead of hash
jonasscheid Sep 13, 2024
c1aacd3
align conda version and container version tag
jonasscheid Sep 13, 2024
280f55e
Merge branch '4698-new-module-openmsthirdpartycometadapter' into 6625…
jonasscheid Sep 13, 2024
49b5a8b
add idmassaccuracy module
jonasscheid Sep 16, 2024
165a57c
Merge branch 'master' into 6625-new-module-openms-idmassaccuracy
SPPearce Sep 16, 2024
9ece877
shorten version parsing
jonasscheid Sep 16, 2024
190afc9
Merge branch '6625-new-module-openms-idmassaccuracy' of https://githu…
jonasscheid Sep 16, 2024
165eb4e
Merge branch 'master' into 6625-new-module-openms-idmassaccuracy
SPPearce Sep 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions modules/nf-core/openms/idmassaccuracy/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
channels:
- conda-forge
- bioconda
dependencies:
- "bioconda::openms=3.1.0"
52 changes: 52 additions & 0 deletions modules/nf-core/openms/idmassaccuracy/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
process OPENMS_IDMASSACCURACY {
tag "$meta.id"
label 'process_single'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/openms:3.1.0--h191ead1_4' :
'biocontainers/openms:3.1.0--h191ead1_4' }"

input:
tuple val(meta), path(mzmls), path(idxmls)

output:
tuple val(meta), path("*frag_mass_err.tsv") , emit: frag_err
tuple val(meta), path("*prec_mass_err.tsv") , emit: prec_err, optional: true
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"

"""
IDMassAccuracy \\
-in $mzmls \\
-id_in $idxmls \\
-out_fragment ${prefix}_frag_mass_err.tsv \\
-threads $task.cpus \\
$args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
OpenMS: \$(FileInfo 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1 | cut -d '-' -f 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably be simpler, what does the initial string look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, OpenMS doesn't provide me with a version flag. This is the initial string:

(csvkit) (base) ubuntu@jonas-ubuntu:/mnt/volume$ FileInfo 2>&1

FileInfo -- Shows basic information about the file, such as data ranges and file type.
Full documentation: http://www.openms.de/doxygen/nightly/html/TOPP_FileInfo.html
Version: 3.2.0-pre-develop-2024-09-06 Sep 10 2024, 21:19:45, Revision: f57a3b9
To cite OpenMS:
 + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spectrometry data. Nat Methods (2024). doi:
   10.1038/s41592-024-02197-7.

Usage:
  FileInfo <options>

Options (mandatory options marked with '*'):
  -in <file>*        Input file (valid formats: 'mzData', 'mzXML', 'mzML', 'sqMass', 'dta', 'dta2d', 'mgf', 'featureXML', 'consensusXML', 'idXML', 
                     'pepXML', 'fid', 'mzid', 'trafoXML', 'fasta', 'pqp')
  -in_type <type>    Input file type -- default: determined from file extension or content (valid: 'mzData', 'mzXML', 'mzML', 'sqMass', 'dta', 'dta2
                     d', 'mgf', 'featureXML', 'consensusXML', 'idXML', 'pepXML', 'fid', 'mzid', 'trafoXML', 'fasta', 'pqp')
  -out <file>        Optional output file. If left out, the output is written to the command line. (valid formats: 'txt')
  -m                 Show meta information about the whole experiment
  -p                 Shows data processing information
  -s                 Computes a five-number statistics of intensities, qualities, and widths
  -d                 Show detailed listing of all spectra and chromatograms (peak files only)
  -c                 Check for corrupt data in the file (peak files only)
  -v                 Validate the file only (for mzML, mzData, mzXML, featureXML, idXML, consensusXML, pepXML)
  -i                 Check whether a given mzML file contains valid indices (conforming to the indexedmzML standard)
                     
Common TOPP options:
  -ini <file>        Use the given TOPP INI file
  -threads <n>       Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>  Writes the default configuration file
  --help             Shows options
  --helphelp         Shows all options (including advanced)

No options given. Aborting!

END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"

"""
touch ${prefix}_frag_mass_err.tsv
touch ${prefix}_prec_mass_err.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
OpenMS: \$(FileInfo 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1 | cut -d '-' -f 1)
END_VERSIONS
"""
}
55 changes: 55 additions & 0 deletions modules/nf-core/openms/idmassaccuracy/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
name: "openms_idmassaccuracy"
description: Calculates a distribution of the mass error from given mass spectra and IDs.
keywords:
- mass_error
- openms
- proteomics
tools:
- "openms":
description: "OpenMS is an open-source software C++ library for LC-MS data management and analyses"
homepage: "https://openms.de"
documentation: "https://openms.readthedocs.io/en/latest/index.html"
tool_dev_url: "https://github.com/OpenMS/OpenMS"
doi: "10.1038/s41592-024-02197-7"
licence: ["BSD"]

input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test' ]`
- mzmls:
type: file
description: |
List containing one or more mzML files
e.g. `[ 'file1.mzML', 'file2.mzML' ]`
pattern: "*.{mzML}"
- idxmls:
type: file
description: |
List containing one or more idXML files
e.g. `[ 'file1.idXML', 'file2.idXML' ]`
pattern: "*.{idXML}"

output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test' ]`
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
- frag_err:
type: file
description: TSV file containing the fragment mass errors
pattern: "*frag_mass_err.{tsv}"
- prec_err:
type: file
description: Optional TSV file containing the precursor mass errors
pattern: "*prec_mass_err.{tsv}"

authors:
- "@jonasscheid"
81 changes: 81 additions & 0 deletions modules/nf-core/openms/idmassaccuracy/tests/main.nf.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
nextflow_process {

name "Test Process OPENMS_IDMASSACCURACY"
script "../main.nf"
process "OPENMS_IDMASSACCURACY"
config "./nextflow.config"

tag "modules"
tag "modules_nfcore"
tag "openms"
tag "openms/idmassaccuracy"
tag "thermorawfileparser"
tag "openms/decoydatabase"
tag "openmsthirdparty/cometadapter"

setup {
run("THERMORAWFILEPARSER") {
script "../../../thermorawfileparser/main.nf"
process {
"""
input[0] = Channel.of([
[ id:'test'],
file(params.modules_testdata_base_path + 'proteomics/msspectra/PXD012083_e005640_II.raw', checkIfExists: true)
])
"""
}
}
run("OPENMS_DECOYDATABASE") {
script "../../../openms/decoydatabase/main.nf"
process {
"""
input[0] = Channel.of([
[ id:'test'],
file(params.modules_testdata_base_path + 'proteomics/database/UP000005640_9606.fasta', checkIfExists: true)
])
"""
}
}
run("OPENMSTHIRDPARTY_COMETADAPTER") {
script "../../../openmsthirdparty/cometadapter/main.nf"
process {
"""
input[0] = THERMORAWFILEPARSER.out.spectra.join(OPENMS_DECOYDATABASE.out.decoy_fasta)
"""
}
}
}

test("proteomics - openms - mass_error") {
when {
process {
"""
input[0] = THERMORAWFILEPARSER.out.spectra.join(OPENMSTHIRDPARTY_COMETADAPTER.out.idxml)
"""
}
}
then {
assertAll(
{ assert process.success },
{ assert snapshot(process.out).match() }
)
}
}

test("proteomics - openms - mass_error - stub") {
options "-stub"
when {
process {
"""
input[0] = THERMORAWFILEPARSER.out.spectra.join(OPENMSTHIRDPARTY_COMETADAPTER.out.idxml)
"""
}
}
then {
assertAll(
{ assert process.success },
{ assert snapshot(process.out).match() }
)
}
}
}
90 changes: 90 additions & 0 deletions modules/nf-core/openms/idmassaccuracy/tests/main.nf.test.snap
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
{
"proteomics - openms - mass_error - stub": {
"content": [
{
"0": [
[
{
"id": "test"
},
"test_frag_mass_err.tsv:md5,d41d8cd98f00b204e9800998ecf8427e"
]
],
"1": [
[
{
"id": "test"
},
"test_prec_mass_err.tsv:md5,d41d8cd98f00b204e9800998ecf8427e"
]
],
"2": [
"versions.yml:md5,84212cbbf016bbcae0eb1bb0fcc30db7"
],
"frag_err": [
[
{
"id": "test"
},
"test_frag_mass_err.tsv:md5,d41d8cd98f00b204e9800998ecf8427e"
]
],
"prec_err": [
[
{
"id": "test"
},
"test_prec_mass_err.tsv:md5,d41d8cd98f00b204e9800998ecf8427e"
]
],
"versions": [
"versions.yml:md5,84212cbbf016bbcae0eb1bb0fcc30db7"
]
}
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.04.4"
},
"timestamp": "2024-09-16T05:12:06.655389198"
},
"proteomics - openms - mass_error": {
"content": [
{
"0": [
[
{
"id": "test"
},
"test_frag_mass_err.tsv:md5,c659254bc1305edde65eca890a6cf36f"
]
],
"1": [

],
"2": [
"versions.yml:md5,84212cbbf016bbcae0eb1bb0fcc30db7"
],
"frag_err": [
[
{
"id": "test"
},
"test_frag_mass_err.tsv:md5,c659254bc1305edde65eca890a6cf36f"
]
],
"prec_err": [

],
"versions": [
"versions.yml:md5,84212cbbf016bbcae0eb1bb0fcc30db7"
]
}
],
"meta": {
"nf-test": "0.9.0",
"nextflow": "24.04.4"
},
"timestamp": "2024-09-16T05:11:55.087464408"
}
}
27 changes: 27 additions & 0 deletions modules/nf-core/openms/idmassaccuracy/tests/nextflow.config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many of these variables are required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All. Spectrum file and identification file are read in to compute the fragment errors (output)

Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
process {

withName:OPENMSTHIRDPARTY_COMETADAPTER {
ext.args = [
"-instrument low_res",
"-fragment_bin_offset 0.4",
"-precursor_mass_tolerance 5",
"-precursor_error_units 'ppm'",
"-fragment_mass_tolerance 0.50025",
"-digest_mass_range '800:5000'",
"-max_variable_mods_in_peptide 1",
"-precursor_charge '2:5'",
"-activation_method 'CID'",
"-variable_modifications 'Oxidation (M)'",
"-enzyme 'unspecific cleavage'",
"-spectrum_batch_size 0"
].join(' ').trim()
}

withName:OPENMS_IDMASSACCURACY {
ext.args = [
"-precursor_error_ppm",
"-fragment_mass_tolerance 0.50025"
].join(' ').trim()
}

}
5 changes: 5 additions & 0 deletions modules/nf-core/openmsthirdparty/cometadapter/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
channels:
- conda-forge
- bioconda
dependencies:
- "bioconda::openms-thirdparty=3.1.0"
55 changes: 55 additions & 0 deletions modules/nf-core/openmsthirdparty/cometadapter/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
process OPENMSTHIRDPARTY_COMETADAPTER {
tag "$meta.id"
label 'process_high'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/openms-thirdparty:3.1.0--h9ee0642_4' :
'biocontainers/openms-thirdparty:3.1.0--h9ee0642_4' }"

input:
tuple val(meta), path(mzml), path(fasta)

output:
tuple val(meta), path("*.idXML"), emit: idxml
tuple val(meta), path("*.tsv") , emit: pin, optional: true
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"

"""
CometAdapter \\
-in $mzml \\
-database $fasta \\
-out ${prefix}.idXML \\
-threads $task.cpus \\
$args


cat <<-END_VERSIONS > versions.yml
"${task.process}":
CometAdapter: \$(CometAdapter 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1 | cut -d '-' -f 1)
Comet: \$(comet 2>&1 | grep -E "Comet version.*" | sed 's/Comet version //g' | sed 's/"//g')
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"

"""
touch ${prefix}.idXML
touch ${prefix}_pin.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
CometAdapter: \$(CometAdapter 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1 | cut -d '-' -f 1)
Comet: \$(comet 2>&1 | grep -E "Comet version.*" | sed 's/Comet version //g' | sed 's/"//g')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably simplify these too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused why this module appears in the diff to main. This module has already been merged last week 🤔 maybe because I developed both in parallel and merged local to work on both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This
FileInfo 2>&1 | grep -E '^Version(.*)' | cut -d ' ' -f 2 | cut -d '-' -f 1
did the job

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll address this version parsing change in a separate PR for all openms tools (also they need a container version bump)

END_VERSIONS
"""
}
Loading
Loading