Support for Additional Profiles and Clusters #29

kylacochrane · 2024-09-27T21:03:21Z

This update aims to enhance the pipeline by integrating additional reference profiles and clusters from user-provided database parameters:

--db_profiles will be incorporated through the APPEND_PROFILES process (which follows LOCIDEX_MERGE_REF).
--db_clusters will be integrated via the APPEND_CLUSTERS process (which follows CLUSTER_FILE).

Both parameters are required for their respective processes, and users must provide both; it is not possible to supply only one.

PR checklist

This comment contains a description of changes (with reason).
If you've fixed a bug or added code that should be tested, add tests!
Make sure your code lints (nf-core lint).
Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
Output Documentation in docs/output.md is updated.
CHANGELOG.md is updated.
README.md is updated (including new tool citations and authors/contributors).

…lusters for reference samples from database parameters

github-actions · 2024-09-27T21:04:48Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit a78a8ad

+| ✅ 144 tests passed       |+
#| ❔  23 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

files_exist - File not found: conf/igenomes_ignored.config
nextflow_config - nf-validation has been detected in the pipeline. Please migrate to nf-schema: https://nextflow-io.github.io/nf-schema/latest/migration_guide/
nextflow_config - Config manifest.version should end in dev: 0.2.3
schema_lint - Schema $id should be https://raw.githubusercontent.com/phac-nml/gasnomenclature/master/nextflow_schema.json
Found https://raw.githubusercontent.com/phac-nml/gasnomenclature/main/nextflow_schema.json

❔ Tests ignored:

files_exist - File is ignored: assets/nf-core-gasnomenclature_logo_light.png
files_exist - File is ignored: docs/images/nf-core-gasnomenclature_logo_dark.png
files_exist - File is ignored: docs/images/nf-core-gasnomenclature_logo_light.png
files_exist - File is ignored: .github/workflows/awstest.yml
files_exist - File is ignored: .github/workflows/awsfulltest.yml
nextflow_config - Config variable ignored: manifest.name
nextflow_config - Config variable ignored: manifest.homePage
nextflow_config - Config variable ignored: params.max_cpus
files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
files_unchanged - File does not exist: assets/nf-core-gasnomenclature_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-gasnomenclature_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-gasnomenclature_logo_dark.png
files_unchanged - File ignored due to lint config: docs/README.md
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/gasnomenclature/gasnomenclature/.github/workflows/awstest.yml
actions_awsfulltest - actions_awsfulltest
pipeline_name_conventions - pipeline_name_conventions

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-gasnomenclature_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowGasnomenclature.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-validation plugin
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.gm_thresholds= 10,5,0
nextflow_config - Config default value correct: params.gm_method= average
nextflow_config - Config default value correct: params.gm_delimiter= .
nextflow_config - Config default value correct: params.pd_distm= hamming
nextflow_config - Config default value correct: params.pd_missing_threshold= 1.0
nextflow_config - Config default value correct: params.pd_sample_quality_threshold= 1.0
nextflow_config - Config default value correct: params.pd_file_type= text
nextflow_config - Config default value correct: params.max_cpus= 4
nextflow_config - Config default value correct: params.max_memory= 2.GB
nextflow_config - Config default value correct: params.max_time= 1.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.validate_params= true
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
readme - README Zenodo placeholder was replaced with DOI.
pipeline_todos - No TODO strings found
plugin_includes - No wrong validation plugin imports have been found
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
base_config - CUSTOM_DUMPSOFTWAREVERSIONS found in conf/base.config and Nextflow scripts.
modules_config - conf/modules.config found and not ignored.
modules_config - INPUT_ASSURE found in conf/modules.config and Nextflow scripts.
modules_config - LOCIDEX_MERGE_REF found in conf/modules.config and Nextflow scripts.
modules_config - LOCIDEX_MERGE_QUERY found in conf/modules.config and Nextflow scripts.
modules_config - PROFILE_DISTS found in conf/modules.config and Nextflow scripts.
modules_config - GAS_CALL found in conf/modules.config and Nextflow scripts.
modules_config - CUSTOM_DUMPSOFTWAREVERSIONS found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.0.1

Run details

nf-core/tools version 3.0.1
Run at 2024-10-22 13:51:26

… and added related tests

emarinier

Looks good to me!

emarinier · 2024-10-08T15:06:08Z

main.nf


+if ((params.db_profiles && !params.db_clusters) || (!params.db_profiles && params.db_clusters)) {


You don't have to change this, because it's probably easier to understand the way you have it, but you could do this with an XOR: true only if one of the two is true.

I don't think Python (edit: Nextflow/Groovy) has an actual logical XOR operator, so it might reduce to something like:

if bool(params.db_profiles) != bool(params.db_clusters):

but again, just a comment, not necessary to change.

Great suggestion Eric - played around a bit and got it working here: 7d54942

sgsutcliffe · 2024-10-08T20:39:41Z

It looks good nothing to change. Always have suggestions though!

It could be made more clear in the README.md (or with additional error reporting but more work). Basically, it should be emphasized that any address levels in the additional databases that are not in the samplesheet address will be dropped. The error could be triggered if max address size in the samplesheet is smaller than number of columns/levels in the database. As those will be will be dropped (I believe based on how csvtk concat works). For the README.md you could emphasize headers must match and do something like this for the example reference database.
address 1.1. ... . n

sample_id	l1	l2	...	ln
sampleA	1	1	...	1
sampleB	1	1	...	2
sampleC	2	1	...	1

apetkau

Thanks so much for making these changes Kyla. Great work 😄

A few in-line comments below

nextflow_schema.json

tests/pipelines/main_append_databases.nf.test

modules/local/append_clusters/main.nf

modules/local/append_profiles/main.nf

…laps and loci mismatches with samples from input

…d_profiles and append_clusters functions

…ust be provided together

kylacochrane · 2024-10-15T14:06:28Z

Thank you everyone for your review and suggestions!
I have updated the README but additional changes will need to be made once the new version of GAS CALL is used, as there will be simplified formatting introduced.

apetkau

Thanks so much Kyla for addressing all my comments. And for adding all those tests 😄. Amazing work.

I just have one more question given in-line.

modules/local/append_profiles/main.nf

apetkau

Thanks so much for all the great work you've done with this Kyla. It looks amazing and handles so many more situations with sample names. I really appreciate it 😄

apetkau · 2024-10-24T20:47:30Z

modules/local/append_clusters/main.nf

+    # Calculate the frequency of each sample_id across both sources
+    csvtk freq -t -f id combined_profiles.tsv > sample_counts.tsv
+
+    # For any sample_id that appears in both the reference and database, add a 'db_' prefix to the sample_id from the database


That's really cool. This would solve the issue with duplicates in all situations then 😄 . Thanks so much.

apetkau · 2024-10-24T20:48:30Z

tests/pipelines/main_append_databases.nf.test

+        }
+    }
+
+    test("Test pipeline when appended profiles or clusters have sample_id overlap") {


Thank you for adding this test 😄

kylacochrane added 2 commits September 27, 2024 16:43

Enhanced pipeline to support integration of additional profiles and c…

e65a7ae

…lusters for reference samples from database parameters

Fixed linting and EC errors

a451b7f

kylacochrane added 4 commits October 4, 2024 09:36

Refactored tests to comply with updated processes

c7d1127

Created new test and data for appending profiles and clusters

73eb9c6

Implemented validation for 'db_profiles' and 'db_clusters' parameters…

06952e3

… and added related tests

Updated documentation

0a91ec6

kylacochrane marked this pull request as ready for review October 4, 2024 20:53

Update nextflow_schema

9f11883

kylacochrane requested review from apetkau, emarinier and sgsutcliffe October 7, 2024 19:45

emarinier approved these changes Oct 8, 2024

View reviewed changes

sgsutcliffe approved these changes Oct 8, 2024

View reviewed changes

apetkau requested changes Oct 9, 2024

View reviewed changes

nextflow_schema.json Outdated Show resolved Hide resolved

tests/pipelines/main_append_databases.nf.test Show resolved Hide resolved

modules/local/append_clusters/main.nf Outdated Show resolved Hide resolved

modules/local/append_profiles/main.nf Outdated Show resolved Hide resolved

kylacochrane added 11 commits October 10, 2024 14:12

Update pipeline to append compressed (gz) files

28c23ae

Add compressed test files

bb91d0d

Updated schemas

e030807

Fixed linting issues

f887f1a

Lint issue resolved

72ab36d

Enhance append_profiles and append_clusters to prevent sample ID over…

1978efe

…laps and loci mismatches with samples from input

Add tests to verify no sample ID overlap and loci mismatches in appen…

8ae527f

…d_profiles and append_clusters functions

Add support for gzipped files in header check for append modules

3002b00

Implement XOR logic to enforce both --db_profiles and --db_clusters m…

7d54942

…ust be provided together

Update README for improved comprehension

e0433c5

Fixed liniting issues

2d83257

kylacochrane requested a review from apetkau October 15, 2024 14:10

apetkau reviewed Oct 15, 2024

View reviewed changes

modules/local/append_profiles/main.nf Outdated Show resolved Hide resolved

modules/local/append_profiles/main.nf Outdated Show resolved Hide resolved

kylacochrane added 2 commits October 15, 2024 16:50

Fix duplicate removal by adding sort before csvtk uniq on sample_id

9780166

Updated tests to verify duplicate removal after sorting by sample_id

7d2f30a

kylacochrane requested a review from apetkau October 15, 2024 21:03

kylacochrane added 4 commits October 22, 2024 09:39

Enhance append database module to prefix duplicate sample IDs

0563a30

Updated tests to validate correct prefixing of duplicated sample IDs

79924e2

Resolved merge conflict with CHANGELOG.md

8b5a418

Fixed EC linting errors

a78a8ad

apetkau approved these changes Oct 24, 2024

View reviewed changes

kylacochrane merged commit 574877c into dev Oct 24, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Additional Profiles and Clusters #29

Support for Additional Profiles and Clusters #29

kylacochrane commented Sep 27, 2024 •

edited

Loading

github-actions bot commented Sep 27, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

emarinier left a comment

emarinier Oct 8, 2024 •

edited

Loading

kylacochrane Oct 11, 2024

sgsutcliffe commented Oct 8, 2024

apetkau left a comment

kylacochrane commented Oct 15, 2024

apetkau left a comment

apetkau left a comment

apetkau Oct 24, 2024

apetkau Oct 24, 2024


		if ((params.db_profiles && !params.db_clusters) \|\| (!params.db_profiles && params.db_clusters)) {

Support for Additional Profiles and Clusters #29

Support for Additional Profiles and Clusters #29

Conversation

kylacochrane commented Sep 27, 2024 • edited Loading

PR checklist

github-actions bot commented Sep 27, 2024 • edited Loading

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

emarinier left a comment

Choose a reason for hiding this comment

emarinier Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

kylacochrane Oct 11, 2024

Choose a reason for hiding this comment

sgsutcliffe commented Oct 8, 2024

apetkau left a comment

Choose a reason for hiding this comment

kylacochrane commented Oct 15, 2024

apetkau left a comment

Choose a reason for hiding this comment

apetkau left a comment

Choose a reason for hiding this comment

apetkau Oct 24, 2024

Choose a reason for hiding this comment

apetkau Oct 24, 2024

Choose a reason for hiding this comment

kylacochrane commented Sep 27, 2024 •

edited

Loading

github-actions bot commented Sep 27, 2024 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

emarinier Oct 8, 2024 •

edited

Loading