Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise 'input_check' to 'input_assure'; enforce JSON key alteration to match the sample ID if a mismatch is detected #13

Merged
merged 22 commits into from
Jun 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
be3ba26
Revise 'input_check' to 'input_assure'; enforce JSON key alteration t…
kylacochrane Jun 6, 2024
95e40f6
Remove id_match from meta
kylacochrane Jun 7, 2024
9e20417
Fix linting
kylacochrane Jun 7, 2024
deb4349
Updated error_message from input_assure
kylacochrane Jun 7, 2024
07fe2c6
Update python script name to match process: input_assure.py
kylacochrane Jun 10, 2024
23c1397
Add 'fair = true' to input_assure process in modules.config for repro…
kylacochrane Jun 10, 2024
c7252cf
Update input_assure.py to include additional check for multiple keys
kylacochrane Jun 10, 2024
f7ed9d3
Fixed linting issues
kylacochrane Jun 10, 2024
7d93226
Merge 'dev' into 'input_assure
kylacochrane Jun 12, 2024
7592bd3
Resolve conflicts between dev and input_assure
kylacochrane Jun 12, 2024
3266330
Add test with gzipped MLST JSON file
kylacochrane Jun 12, 2024
82c3a0d
Added test for mismatched IDs
kylacochrane Jun 12, 2024
0017090
Update paths in samplesheet
kylacochrane Jun 12, 2024
3f181eb
Fix EC issues
kylacochrane Jun 12, 2024
1f52529
Fix EC issues
kylacochrane Jun 12, 2024
ec347e4
Removed unexpected character (#) in main.nf.test
kylacochrane Jun 12, 2024
7c1b5dc
Add test data for multiple keyed JSON file
kylacochrane Jun 13, 2024
8e8ffa4
Tests added to handle when there are multiple sample entries (keys) i…
kylacochrane Jun 13, 2024
7909673
Updated input_assure to identify when MLST JSON is empty. Added corre…
kylacochrane Jun 13, 2024
da8c829
EC issue fix
kylacochrane Jun 13, 2024
6642b72
Create a new JSON output file in input_assure
kylacochrane Jun 13, 2024
348fe95
Ensure MLST JSON files from input_assure are gzipped
kylacochrane Jun 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions bin/input_assure.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
#!/usr/bin/env python

import json
import argparse
import csv
import gzip
import sys


def open_file(file_path, mode):
# Open a file based on the file extension
if file_path.endswith(".gz"):
return gzip.open(file_path, mode)
else:
return open(file_path, mode)


def check_inputs(json_file, sample_id, address, output_error_file, output_json_file):
with open_file(json_file, "rt") as f:
json_data = json.load(f)

# Define a variable to store the match_status (True or False)
match_status = sample_id in json_data

# Initialize the error message
error_message = None

# Check for multiple keys in the JSON file and define error message
keys = list(json_data.keys())
original_key = keys[0] if keys else None

if len(keys) == 0:
error_message = f"{json_file} is completely empty!"
print(error_message)
sys.exit(1)
elif len(keys) > 1:
# Check if sample_id matches any key
if not match_status:
error_message = f"No key in the MLST JSON file ({json_file}) matches the specified sample ID '{sample_id}'. The first key '{original_key}' has been forcefully changed to '{sample_id}' and all other keys have been removed."
# Retain only the specified sample ID
json_data = {sample_id: json_data.pop(original_key)}
else:
error_message = f"MLST JSON file ({json_file}) contains multiple keys: {keys}. The MLST JSON file has been modified to retain only the '{sample_id}' entry"
# Remove all keys expect the one matching sample_id
json_data = {sample_id: json_data[sample_id]}
elif not match_status:
# Define error message based on meta.address (query or reference)
if address == "null":
error_message = f"Query {sample_id} ID and JSON key in {json_file} DO NOT MATCH. The '{original_key}' key in {json_file} has been forcefully changed to '{sample_id}': User should manually check input files to ensure correctness."
else:
error_message = f"Reference {sample_id} ID and JSON key in {json_file} DO NOT MATCH. The '{original_key}' key in {json_file} has been forcefully changed to '{sample_id}': User should manually check input files to ensure correctness."
# Update the JSON file with the new sample ID
json_data[sample_id] = json_data.pop(original_key)

# Write file containing relevant error messages
if error_message:
with open(output_error_file, "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["sample", "JSON_key", "error_message"])
writer.writerow([sample_id, keys, error_message])

# Write the updated JSON data back to the original file
with gzip.open(output_json_file, "wt") as f:
json.dump(json_data, f, indent=4)


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Check sample inputs, force change if ID ≠ KEY, and generate an error report."
)
parser.add_argument("--input", help="Path to the mlst.json file.", required=True)
parser.add_argument(
"--sample_id", help="Sample ID to check in the JSON file.", required=True
)
parser.add_argument(
"--address", help="Address to use in the error message.", required=True
)
parser.add_argument(
"--output_error", help="Path to the error report file.", required=True
)
parser.add_argument(
"--output_json", help="Path to the MLST JSON file (gzipped).", required=True
)

args = parser.parse_args()

check_inputs(
args.input, args.sample_id, args.address, args.output_error, args.output_json
)
55 changes: 0 additions & 55 deletions bin/input_check.py

This file was deleted.

6 changes: 4 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,6 @@
process {

// Publish directory names
assembly_directory_name = "assembly"
summary_directory_name = "summary"
profile_dists_directory_name = "distances"
gas_call_directory_name = "call"

Expand All @@ -27,6 +25,10 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

withName: INPUT_ASSURE {
fair = true
}

withName: LOCIDEX_MERGE_REF {
publishDir = [
path: locidex_merge_ref_directory_name,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
process INPUT_CHECK{
tag "Check Sample Inputs and Generate Error Report"
process INPUT_ASSURE {
tag "Assures Inputs are Consistent"
label 'process_single'
fair true

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/python:3.8.3' :
Expand All @@ -11,19 +10,19 @@ process INPUT_CHECK{
tuple val(meta), path(mlst)

output:
tuple val(meta), path("${meta.id}_match.txt"), path(mlst), emit: match
tuple val(meta), path("${meta.id}.mlst.json.gz"), emit: result
tuple val(meta), path("*_error_report.csv"), optional: true, emit: error_report
path("versions.yml"), emit: versions

script:

"""
input_check.py \\
input_assure.py \\
--input ${mlst} \\
--sample_id ${meta.id} \\
--address ${meta.address} \\
--output_error ${meta.id}_error_report.csv \\
--output_match ${meta.id}_match.txt
--output_json ${meta.id}.mlst.json.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand Down
27 changes: 27 additions & 0 deletions tests/data/irida/mismatched_iridanext.output.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"files": {
"global": [],
"samples": {
"sampleR": [
{
"path": "input/sampleR_error_report.csv"
}
],
"sample2": [
{
"path": "input/sample2_error_report.csv"
}
]
}
},
"metadata": {
"samples": {
"sampleQ": {
"address": "2.2.3"
},
"sampleR": {
"address": "2.2.3"
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
"files": {
"global": [],
"samples": {
"sampleR": [
"sample3": [
{
"path": "input/sampleR_error_report.csv"
"path": "input/sample3_error_report.csv"
}
]
}
Expand Down
Binary file added tests/data/reports/sample1.mlst.json.gz
Binary file not shown.
1 change: 1 addition & 0 deletions tests/data/reports/sample2_empty.mlst.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{}
12 changes: 12 additions & 0 deletions tests/data/reports/sample3_multiplekeys.mlst.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"extra_key": {
"l1": "1",
"l2": "1",
"l3": "2"
},
"sample3": {
"l1": "1",
"l2": "1",
"l3": "2"
}
}
12 changes: 12 additions & 0 deletions tests/data/reports/sample3_multiplekeys_nomatch.mlst.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"sample4": {
"l1": "1",
"l2": "1",
"l3": "2"
},
"extra_key": {
"l1": "1",
"l2": "1",
"l3": "2"
}
}
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
sample,mlst_alleles,address
sampleR,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sampleF.mlst.json,
sampleQ,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sampleQ.mlst.json,
sampleR,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sampleF.mlst.json,
sample1,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sample1.mlst.json,1.1.1
sample2,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sample2.mlst.json,1.1.1
sample2,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sample7.mlst.json,1.1.1
sample3,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sample3.mlst.json,1.1.2

5 changes: 5 additions & 0 deletions tests/data/samplesheets/samplesheet-multiple_keys.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
sample,mlst_alleles,address
sampleQ,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sampleQ.mlst.json,
sample1,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sample1.mlst.json,1.1.1
sample2,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sample2.mlst.json,1.1.1
sample3,https://raw.githubusercontent.com/phac-nml/gasnomenclature/input_assure/tests/data/reports/sample3_multiplekeys.mlst.json,1.1.2
5 changes: 5 additions & 0 deletions tests/data/samplesheets/samplesheet-multiplekeys_nomatch.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
sample,mlst_alleles,address
sampleQ,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sampleQ.mlst.json,
sample1,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sample1.mlst.json,1.1.1
sample2,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sample2.mlst.json,1.1.1
sample3,https://raw.githubusercontent.com/phac-nml/gasnomenclature/input_assure/tests/data/reports/sample3_multiplekeys_nomatch.mlst.json,1.1.2
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
sample,mlst_alleles,address
sampleQ,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sampleF.mlst.json,
sampleQ,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sampleQ.mlst.json,
sample1,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sample1.mlst.json,1.1.1
sample2,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sample7.mlst.json,1.1.1
sample2,https://raw.githubusercontent.com/phac-nml/gasnomenclature/input_assure/tests/data/reports/sample2_empty.mlst.json,1.1.1
sample3,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sample3.mlst.json,1.1.2

5 changes: 5 additions & 0 deletions tests/data/samplesheets/samplesheet_gzip.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
sample,mlst_alleles,address
sampleQ,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sampleQ.mlst.json,
sample1,https://raw.githubusercontent.com/phac-nml/gasnomenclature/input_assure/tests/data/reports/sample1.mlst.json.gz,1.1.1
sample2,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sample2.mlst.json,1.1.1
sample3,https://raw.githubusercontent.com/phac-nml/gasnomenclature/dev/tests/data/reports/sample3.mlst.json,1.1.2
Loading
Loading