Revise 'input_check' to 'input_assure'; enforce JSON key alteration to match the sample ID if a mismatch is detected #13

kylacochrane · 2024-06-06T21:00:28Z

This PR resolves an issue that arises when a .mlst.json file, generated by Locidex, retains the original IRIDA Next sample identifier after a sample is cloned into a new project, leading to a mismatch in identifiers.

To address this, the PR alters the previous input_check process, which now reads the .mlst.json file from Locidex and if the sample identifier does not match the JSON key, it is overwritten to ensure consistency.

The process has been renamed to input_assure for clarity.

An error_report.csv is generated to identify any samples where the JSON key has been forcefully altered and discloses whether they are a query or reference sample in the pipeline.

Info on added tests can be seen below in a separate comment.

…o match the sample ID if a mismatch is detected

github-actions · 2024-06-06T21:01:54Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 348fe95

+| ✅ 146 tests passed       |+
#| ❔  22 tests were ignored |#
!| ❗   1 tests had warnings |!

❗ Test warnings:

schema_lint - Schema $id should be https://raw.githubusercontent.com/phac-nml/gasnomenclature/master/nextflow_schema.json
Found https://raw.githubusercontent.com/phac-nml/gasnomenclature/main/nextflow_schema.json

❔ Tests ignored:

files_exist - File is ignored: assets/nf-core-gasnomenclature_logo_light.png
files_exist - File is ignored: docs/images/nf-core-gasnomenclature_logo_dark.png
files_exist - File is ignored: docs/images/nf-core-gasnomenclature_logo_light.png
files_exist - File is ignored: .github/workflows/awstest.yml
files_exist - File is ignored: .github/workflows/awsfulltest.yml
nextflow_config - Config variable ignored: manifest.name
nextflow_config - Config variable ignored: manifest.homePage
files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
files_unchanged - File does not exist: assets/nf-core-gasnomenclature_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-gasnomenclature_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-gasnomenclature_logo_dark.png
files_unchanged - File ignored due to lint config: docs/README.md
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/gasnomenclature/gasnomenclature/.github/workflows/awstest.yml
actions_awsfulltest - actions_awsfulltest
pipeline_name_conventions - pipeline_name_conventions

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-gasnomenclature_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowGasnomenclature.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: v0.0.1dev
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.gm_thresholds= 10,5,0
nextflow_config - Config default value correct: params.gm_method= average
nextflow_config - Config default value correct: params.gm_delimiter= .
nextflow_config - Config default value correct: params.pd_distm= hamming
nextflow_config - Config default value correct: params.pd_missing_threshold= 1.0
nextflow_config - Config default value correct: params.pd_sample_quality_threshold= 1.0
nextflow_config - Config default value correct: params.pd_file_type= text
nextflow_config - Config default value correct: params.pd_count_missing= false
nextflow_config - Config default value correct: params.max_cpus= 4
nextflow_config - Config default value correct: params.max_memory= 2.GB
nextflow_config - Config default value correct: params.max_time= 1.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.validate_params= true
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
readme - README Zenodo placeholder was replaced with DOI.
pipeline_todos - No TODO strings found
template_strings - Did not find any Jinja template strings (104 files)
schema_lint - Schema lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
base_config - CUSTOM_DUMPSOFTWAREVERSIONS found in conf/base.config and Nextflow scripts.
modules_config - conf/modules.config found and not ignored.
modules_config - INPUT_ASSURE found in conf/modules.config and Nextflow scripts.
modules_config - LOCIDEX_MERGE_REF found in conf/modules.config and Nextflow scripts.
modules_config - LOCIDEX_MERGE_QUERY found in conf/modules.config and Nextflow scripts.
modules_config - PROFILE_DISTS found in conf/modules.config and Nextflow scripts.
modules_config - GAS_CALL found in conf/modules.config and Nextflow scripts.
modules_config - CUSTOM_DUMPSOFTWAREVERSIONS found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 2.14.1

Run details

nf-core/tools version 2.14.1
Run at 2024-06-14 16:03:39

emarinier · 2024-06-06T21:19:03Z

workflows/gas_nomenclature.nf

-            throw new RuntimeException("Pipeline exiting: sample with ID ${meta.id} does not have matching MLST JSON file.")
-        }
-    }
+    match.view()


Do we want to leave the .view() in or was this for debugging?

Haha - I had used it for debugging.
Although highlighting this also made me realize that we no longer need to store the id_match boolean in the meta data as we won't be removing any samples from the analysis.

I have updated the python code, the workflow, and the input_assure process to simplify this here: 95e40f6

emarinier · 2024-06-06T21:22:32Z

workflows/gas_nomenclature.nf

+    profiles = match.branch {
        query: !it[0].address
    }


Is this for only querying profiles that have don't already have an address?

Yes exactly 😄

apetkau

This is amazing @kylacochrane 😄 . Thanks so much.

In addition to the in-line comments, could you fix the failing test prior to us merging the PR.

bin/input_check.py

…ducibility

…n provided MLST JSON file(s)

kylacochrane · 2024-06-13T15:35:31Z

The following tests have been replaced or added:

Added a test for mismatched JSON keys and sampleIDs, covering both reference and query samples. This single test replaces the two previous tests that checked for removal of mismatched samples from the pipeline.
82c3a0d
Added tests for scenarios where the MLST JSON file contains multiple entries (keys). Two tests were added: one verifying a match between a key and the sampleID, and another where none of the multiple keys match the sampleID.
8e8ffa4
7c1b5dc
Introduced a test for handling gzipped MLST JSON files to ensure input_assure can manage compressed files effectively.
3266330
Added a test for when MLST JSON file(s) is/are empty.
7909673

apetkau

Thanks so much Kyla for all your hard work and the updates you've made 😄

I have just one additional comment.

bin/input_assure.py

…sponding test

apetkau

Thanks so much for all the changes Kyla 😄 . Just one more comment.

modules/local/input_assure/main.nf

apetkau

This is amazing @kylacochrane . Thanks so much for all your hard work 😄

Revise 'input_check' to 'input_assure'; enforce JSON key alteration t…

be3ba26

…o match the sample ID if a mismatch is detected

kylacochrane marked this pull request as ready for review June 6, 2024 21:01

kylacochrane requested review from apetkau and emarinier June 6, 2024 21:01

emarinier requested changes Jun 6, 2024

View reviewed changes

kylacochrane added 3 commits June 7, 2024 08:47

Remove id_match from meta

95e40f6

Fix linting

9e20417

Updated error_message from input_assure

deb4349

emarinier approved these changes Jun 7, 2024

View reviewed changes

apetkau requested changes Jun 7, 2024

View reviewed changes

bin/input_check.py Outdated Show resolved Hide resolved

bin/input_check.py Outdated Show resolved Hide resolved

kylacochrane added 4 commits June 10, 2024 16:52

Update python script name to match process: input_assure.py

07fe2c6

Add 'fair = true' to input_assure process in modules.config for repro…

23c1397

…ducibility

Update input_assure.py to include additional check for multiple keys

c7252cf

Fixed linting issues

f7ed9d3

kylacochrane mentioned this pull request Jun 10, 2024

Supporting Mismatched Sample IDs phac-nml/gasclustering#21

Merged

kylacochrane added 10 commits June 12, 2024 15:09

Merge 'dev' into 'input_assure

7d93226

Resolve conflicts between dev and input_assure

7592bd3

Add test with gzipped MLST JSON file

3266330

Added test for mismatched IDs

82c3a0d

Update paths in samplesheet

0017090

Fix EC issues

3f181eb

Fix EC issues

1f52529

Removed unexpected character (#) in main.nf.test

ec347e4

Add test data for multiple keyed JSON file

7c1b5dc

Tests added to handle when there are multiple sample entries (keys) i…

8e8ffa4

…n provided MLST JSON file(s)

apetkau requested changes Jun 13, 2024

View reviewed changes

bin/input_assure.py Outdated Show resolved Hide resolved

kylacochrane added 2 commits June 13, 2024 16:10

Updated input_assure to identify when MLST JSON is empty. Added corre…

7909673

…sponding test

EC issue fix

da8c829

Create a new JSON output file in input_assure

6642b72

apetkau requested changes Jun 14, 2024

View reviewed changes

modules/local/input_assure/main.nf Outdated Show resolved Hide resolved

Ensure MLST JSON files from input_assure are gzipped

348fe95

apetkau approved these changes Jun 14, 2024

View reviewed changes

kylacochrane merged commit 11167bd into dev Jun 14, 2024
4 checks passed

kylacochrane deleted the input_assure branch June 14, 2024 20:44

kylacochrane restored the input_assure branch June 17, 2024 17:39

kylacochrane deleted the input_assure branch June 27, 2024 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise 'input_check' to 'input_assure'; enforce JSON key alteration to match the sample ID if a mismatch is detected #13

Revise 'input_check' to 'input_assure'; enforce JSON key alteration to match the sample ID if a mismatch is detected #13

kylacochrane commented Jun 6, 2024 •

edited

Loading

github-actions bot commented Jun 6, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

emarinier Jun 6, 2024

kylacochrane Jun 7, 2024 •

edited

Loading

emarinier Jun 6, 2024

kylacochrane Jun 7, 2024

apetkau left a comment •

edited

Loading

kylacochrane commented Jun 13, 2024 •

edited

Loading

apetkau left a comment

apetkau left a comment

apetkau left a comment

Revise 'input_check' to 'input_assure'; enforce JSON key alteration to match the sample ID if a mismatch is detected #13

Revise 'input_check' to 'input_assure'; enforce JSON key alteration to match the sample ID if a mismatch is detected #13

Conversation

kylacochrane commented Jun 6, 2024 • edited Loading

github-actions bot commented Jun 6, 2024 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

emarinier Jun 6, 2024

Choose a reason for hiding this comment

kylacochrane Jun 7, 2024 • edited Loading

Choose a reason for hiding this comment

emarinier Jun 6, 2024

Choose a reason for hiding this comment

kylacochrane Jun 7, 2024

Choose a reason for hiding this comment

apetkau left a comment • edited Loading

Choose a reason for hiding this comment

kylacochrane commented Jun 13, 2024 • edited Loading

apetkau left a comment

Choose a reason for hiding this comment

apetkau left a comment

Choose a reason for hiding this comment

apetkau left a comment

Choose a reason for hiding this comment

kylacochrane commented Jun 6, 2024 •

edited

Loading

github-actions bot commented Jun 6, 2024 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️

kylacochrane Jun 7, 2024 •

edited

Loading

apetkau left a comment •

edited

Loading

kylacochrane commented Jun 13, 2024 •

edited

Loading