Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise 'input_check' to 'input_assure'; enforce JSON key alteration to match the sample ID if a mismatch is detected #13

Merged
merged 22 commits into from
Jun 14, 2024

Conversation

kylacochrane
Copy link
Contributor

@kylacochrane kylacochrane commented Jun 6, 2024

This PR resolves an issue that arises when a .mlst.json file, generated by Locidex, retains the original IRIDA Next sample identifier after a sample is cloned into a new project, leading to a mismatch in identifiers.

To address this, the PR alters the previous input_check process, which now reads the .mlst.json file from Locidex and if the sample identifier does not match the JSON key, it is overwritten to ensure consistency.

The process has been renamed to input_assure for clarity.

An error_report.csv is generated to identify any samples where the JSON key has been forcefully altered and discloses whether they are a query or reference sample in the pipeline.

Info on added tests can be seen below in a separate comment.

…o match the sample ID if a mismatch is detected
@kylacochrane kylacochrane marked this pull request as ready for review June 6, 2024 21:01
@kylacochrane kylacochrane requested review from apetkau and emarinier June 6, 2024 21:01
Copy link

github-actions bot commented Jun 6, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 348fe95

+| ✅ 146 tests passed       |+
#| ❔  22 tests were ignored |#
!| ❗   1 tests had warnings |!

❗ Test warnings:

  • schema_lint - Schema $id should be https://raw.githubusercontent.com/phac-nml/gasnomenclature/master/nextflow_schema.json
    Found https://raw.githubusercontent.com/phac-nml/gasnomenclature/main/nextflow_schema.json

❔ Tests ignored:

  • files_exist - File is ignored: assets/nf-core-gasnomenclature_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-gasnomenclature_logo_dark.png
  • files_exist - File is ignored: docs/images/nf-core-gasnomenclature_logo_light.png
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • files_unchanged - File does not exist: assets/nf-core-gasnomenclature_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-gasnomenclature_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-gasnomenclature_logo_dark.png
  • files_unchanged - File ignored due to lint config: docs/README.md
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/gasnomenclature/gasnomenclature/.github/workflows/awstest.yml
  • actions_awsfulltest - actions_awsfulltest
  • pipeline_name_conventions - pipeline_name_conventions

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-06-14 16:03:39

throw new RuntimeException("Pipeline exiting: sample with ID ${meta.id} does not have matching MLST JSON file.")
}
}
match.view()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to leave the .view() in or was this for debugging?

Copy link
Contributor Author

@kylacochrane kylacochrane Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha - I had used it for debugging.
Although highlighting this also made me realize that we no longer need to store the id_match boolean in the meta data as we won't be removing any samples from the analysis.

I have updated the python code, the workflow, and the input_assure process to simplify this here: 95e40f6

Comment on lines 86 to 88
profiles = match.branch {
query: !it[0].address
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for only querying profiles that have don't already have an address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly 😄

Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing @kylacochrane 😄 . Thanks so much.

In addition to the in-line comments, could you fix the failing test prior to us merging the PR.

bin/input_check.py Outdated Show resolved Hide resolved
bin/input_check.py Outdated Show resolved Hide resolved
@kylacochrane
Copy link
Contributor Author

kylacochrane commented Jun 13, 2024

The following tests have been replaced or added:

  1. Added a test for mismatched JSON keys and sampleIDs, covering both reference and query samples. This single test replaces the two previous tests that checked for removal of mismatched samples from the pipeline.
    82c3a0d

  2. Added tests for scenarios where the MLST JSON file contains multiple entries (keys). Two tests were added: one verifying a match between a key and the sampleID, and another where none of the multiple keys match the sampleID.
    8e8ffa4
    7c1b5dc

  3. Introduced a test for handling gzipped MLST JSON files to ensure input_assure can manage compressed files effectively.
    3266330

  4. Added a test for when MLST JSON file(s) is/are empty.
    7909673

Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much Kyla for all your hard work and the updates you've made 😄

I have just one additional comment.

bin/input_assure.py Outdated Show resolved Hide resolved
Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for all the changes Kyla 😄 . Just one more comment.

modules/local/input_assure/main.nf Outdated Show resolved Hide resolved
Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing @kylacochrane . Thanks so much for all your hard work 😄

@kylacochrane kylacochrane merged commit 11167bd into dev Jun 14, 2024
4 checks passed
@kylacochrane kylacochrane deleted the input_assure branch June 14, 2024 20:44
@kylacochrane kylacochrane restored the input_assure branch June 17, 2024 17:39
@kylacochrane kylacochrane deleted the input_assure branch June 27, 2024 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants