diff --git a/eva_sub_cli/etc/eva_logo.png b/eva_sub_cli/etc/eva_logo.png
index 7e018a7..b731de2 100644
Binary files a/eva_sub_cli/etc/eva_logo.png and b/eva_sub_cli/etc/eva_logo.png differ
diff --git a/eva_sub_cli/report.py b/eva_sub_cli/report.py
index f546b92..8fdeac9 100644
--- a/eva_sub_cli/report.py
+++ b/eva_sub_cli/report.py
@@ -1,5 +1,6 @@
import base64
import os.path
+import re
from jinja2 import Environment, FileSystemLoader
@@ -35,10 +36,5 @@ def generate_html_report(validation_results, validation_date, submission_dir, vc
vcf_fasta_analysis_mapping=vcf_fasta_analysis_mapping,
validation_results=validation_results
)
+ return re.sub('\s+\n', '\n', rendered_template)
- try:
- # minify-html is not included in conda installation currently
- from minify_html import minify_html
- return minify_html.minify(rendered_template, minify_js=True, remove_processing_instructions=True)
- except ImportError:
- return rendered_template
diff --git a/requirements.txt b/requirements.txt
index b2a450b..1c20ce8 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,7 +1,6 @@
ebi_eva_common_pyutils==0.6.10
jinja2
jsonschema
-minify_html==0.11.1
openpyxl
pyyaml
requests
diff --git a/tests/resources/validation_reports/expected_report_metadata_json.html b/tests/resources/validation_reports/expected_report_metadata_json.html
index 6ef2da7..f5ef23d 100644
--- a/tests/resources/validation_reports/expected_report_metadata_json.html
+++ b/tests/resources/validation_reports/expected_report_metadata_json.html
@@ -1,22 +1,473 @@
-
Validation ReportProject Summary
General details about the project
Project Title: My cool project
Validation Date: 2023-08-31 12:34:56
Submission Directory: /test/submission/dir
▶ Files mapping
VCF File | Fasta File | Analysis |
---|
input_fail.vcf | input_fail.fa | A |
input_pass.vcf | input_pass.fa | B |
input_test.vcf | input_test.fa | could not be linked |
Metadata validation results
Ensures that required fields are present and values are formatted correctly. For requirements, please refer to the
EVA website.
▶ ❌ Metadata validation check
Full report: /path/to/json/metadata/report
JSON Property | Error Description |
---|
.files | should have required property 'files' |
/project.title | should have required property 'title' |
/project.description | should have required property 'description' |
/project.taxId | should have required property 'taxId' |
/project.centre | should have required property 'centre' |
/analysis/0.analysisTitle | should have required property 'analysisTitle' |
/analysis/0.description | should have required property 'description' |
/analysis/0.experimentType | should have required property 'experimentType' |
/analysis/0.referenceGenome | should have required property 'referenceGenome' |
/sample/0.bioSampleAccession | should have required property 'bioSampleAccession' |
/sample/0.bioSampleObject | should have required property 'bioSampleObject' |
/sample/0 | should match exactly one schema in oneOf |
VCF validation results
Checks whether each file is compliant with the
VCF specification. Also checks whether the variants' reference alleles match against the reference assembly.
input_fail.vcf
▶ ❌ Assembly check: 26/36 (72.22%)
First 10 errors per category are below. Full report: /path/to/assembly_failed/report
Category | Error |
---|
Parsing Error | The assembly checking could not be completed: Contig 'chr23' not found in assembly report |
mismatch error | Chromosome 1, position 35549, reference allele 'G' does not match the reference sequence, expected 'c' |
mismatch error | Chromosome 1, position 35595, reference allele 'G' does not match the reference sequence, expected 'a' |
mismatch error | Chromosome 1, position 35618, reference allele 'G' does not match the reference sequence, expected 'c' |
mismatch error | Chromosome 1, position 35626, reference allele 'A' does not match the reference sequence, expected 'g' |
mismatch error | Chromosome 1, position 35639, reference allele 'T' does not match the reference sequence, expected 'c' |
mismatch error | Chromosome 1, position 35643, reference allele 'T' does not match the reference sequence, expected 'g' |
mismatch error | Chromosome 1, position 35717, reference allele 'T' does not match the reference sequence, expected 'g' |
mismatch error | Chromosome 1, position 35819, reference allele 'T' does not match the reference sequence, expected 'a' |
mismatch error | Chromosome 1, position 35822, reference allele 'T' does not match the reference sequence, expected 'c' |
▶ ❌ VCF check: 1 critical errors, 1 non-critical errors
First 10 errors per category are below. Full report: /path/to/vcf_failed/report
Category | Error |
---|
critical error | Line 4: Error in meta-data section. |
non-critical error | Sample #11, field AD does not match the meta specification Number=R (expected 2 value(s)). AD=.. |
input_passed.vcf
✔ Assembly check: 247/247 (100.0%)
✔ VCF check: 0 critical errors, 0 non-critical errors
Sample name concordance check
Checks whether information in the metadata is concordant with that contained in the VCF files, in particular sample names.
▶ ❌ Analysis A: Sample names in metadata do not match with those in VCF files
Category | First 5 Errors For Category | Link To View All Errors |
---|
Samples described in the metadata but not in the VCF files | SampleA1, SampleA2 , SampleA3, SampleA4, SampleA5 | Show All Errors For Category |
Samples in the VCF files but not described in the metadata | A1Sample , A2Sample, A3Sample, A4Sample, A5Sample | Show All Errors For Category |
All Errors For Category - Samples in the VCF files but not described in the metadata:
- A1Sample•
- •A2Sample
- A3Sample
- A4Sample
- A5Sample
- A6Sample
- A7Sample
- A8Sample
- A9Sample
- A10Sample
Hide ✔ Analysis B: Sample names in metadata match with those in VCF files
▶ ❌ Analysis C: Sample names in metadata do not match with those in VCF files
Category | First 5 Errors For Category | Link To View All Errors |
---|
Samples described in the metadata but not in the VCF files | SampleC1 , SampleC2, SampleC3, SampleC4 | Show All Errors For Category |
Samples in the VCF files but not described in the metadata | C1Sample , C2Sample, C3Sample, C4Sample | Show All Errors For Category |
All Errors For Category - Samples in the VCF files but not described in the metadata:
- C1Sample•
- •C2Sample
- C3Sample
- C4Sample
HideReference genome INSDC check
Checks that the reference sequences in the FASTA file used to call the variants are accessioned in INSDC. Also checks if the reference assembly accession in the metadata matches the one determined from the FASTA file.
metadata_asm_match.fa
✔ All sequences are INSDC accessioned
✔ Analysis A: Assembly accession in metadata is compatible
metadata_asm_not_found.fa
✔ All sequences are INSDC accessioned
▶ ❌ No assembly accession found in metadata
Full report: /path/to/metadata_asm_not_found.yml
Category | Accessions |
---|
Assembly accession found in metadata | Not found |
Assembly accession(s) compatible with FASTA | GCA_1 |
metadata_asm_not_match.fa
✔ All sequences are INSDC accessioned
▶ ❌ Analysis B: Assembly accession in metadata is not compatible
Full report: /path/to/metadata_asm_not_match.yml
Category | Accessions |
---|
Assembly accession found in metadata | GCA_2 |
Assembly accession(s) compatible with FASTA | GCA_1 |
metadata_error.fa
Warning: The following results may be incomplete due to problems with external services. Please try again later for complete results.
Error message: 500 Server Error: Internal Server Error for url: https://www.ebi.ac.uk/eva/webservices/contig-alias/v1/chromosomes/md5checksum/hjfdoijsfc47hfg0gh9qwjrve
✔ All sequences are INSDC accessioned
✔ Analysis C: Assembly accession in metadata is compatible
not_all_insdc.fa
▶ ❌ Some sequences are not INSDC accessioned
First 10 sequences not in INSDC. Full report: /path/to/not_all_insdc_check.yml
Sequence name | Refget md5 |
---|
2 | hjfdoijsfc47hfg0gh9qwjrve |
✔ Analysis A: Assembly accession in metadata is compatible
\ No newline at end of file
+ .error-list, .no-show { display: none; }
+
+
+
+
+
+ Project Summary
+
+ General details about the project
+
+
+
Project Title: My cool project
+
Validation Date: 2023-08-31 12:34:56
+
Submission Directory: /test/submission/dir
+
+ ▶ Files mapping
+
+
+
+ VCF File |
+ Fasta File |
+ Analysis |
+
+
+ input_fail.vcf |
+ input_fail.fa |
+ A |
+
+
+ input_pass.vcf |
+ input_pass.fa |
+ B |
+
+
+ input_test.vcf |
+ input_test.fa |
+ could not be linked |
+
+
+
+
+
+
+ Metadata validation results
+
+ Ensures that required fields are present and values are formatted correctly.
+ For requirements, please refer to the
EVA website.
+
+ ▶ ❌ Metadata validation check
+
+
Full report: /path/to/json/metadata/report
+
+
+ JSON Property | Error Description |
+
+
+ .files |
+ should have required property 'files' |
+
+
+ /project.title |
+ should have required property 'title' |
+
+
+ /project.description |
+ should have required property 'description' |
+
+
+ /project.taxId |
+ should have required property 'taxId' |
+
+
+ /project.centre |
+ should have required property 'centre' |
+
+
+ /analysis/0.analysisTitle |
+ should have required property 'analysisTitle' |
+
+
+ /analysis/0.description |
+ should have required property 'description' |
+
+
+ /analysis/0.experimentType |
+ should have required property 'experimentType' |
+
+
+ /analysis/0.referenceGenome |
+ should have required property 'referenceGenome' |
+
+
+ /sample/0.bioSampleAccession |
+ should have required property 'bioSampleAccession' |
+
+
+ /sample/0.bioSampleObject |
+ should have required property 'bioSampleObject' |
+
+
+ /sample/0 |
+ should match exactly one schema in oneOf |
+
+
+
+
+
+ VCF validation results
+
+ Checks whether each file is compliant with the
VCF specification.
+ Also checks whether the variants' reference alleles match against the reference assembly.
+
+ input_fail.vcf
+ ▶ ❌ Assembly check: 26/36 (72.22%)
+
+
First 10 errors per category are below. Full report: /path/to/assembly_failed/report
+
+
+ Category | Error |
+
+
+ Parsing Error | The assembly checking could not be completed: Contig 'chr23' not found in assembly report |
+
+
+ mismatch error | Chromosome 1, position 35549, reference allele 'G' does not match the reference sequence, expected 'c' |
+
+
+ mismatch error | Chromosome 1, position 35595, reference allele 'G' does not match the reference sequence, expected 'a' |
+
+
+ mismatch error | Chromosome 1, position 35618, reference allele 'G' does not match the reference sequence, expected 'c' |
+
+
+ mismatch error | Chromosome 1, position 35626, reference allele 'A' does not match the reference sequence, expected 'g' |
+
+
+ mismatch error | Chromosome 1, position 35639, reference allele 'T' does not match the reference sequence, expected 'c' |
+
+
+ mismatch error | Chromosome 1, position 35643, reference allele 'T' does not match the reference sequence, expected 'g' |
+
+
+ mismatch error | Chromosome 1, position 35717, reference allele 'T' does not match the reference sequence, expected 'g' |
+
+
+ mismatch error | Chromosome 1, position 35819, reference allele 'T' does not match the reference sequence, expected 'a' |
+
+
+ mismatch error | Chromosome 1, position 35822, reference allele 'T' does not match the reference sequence, expected 'c' |
+
+
+
+ ▶ ❌ VCF check: 1 critical errors, 1 non-critical errors
+
+
First 10 errors per category are below. Full report: /path/to/vcf_failed/report
+
+
+ Category | Error |
+
+
+ critical error | Line 4: Error in meta-data section. |
+
+
+ non-critical error | Sample #11, field AD does not match the meta specification Number=R (expected 2 value(s)). AD=.. |
+
+
+
+ input_passed.vcf
+ ✔ Assembly check: 247/247 (100.0%)
+ ✔ VCF check: 0 critical errors, 0 non-critical errors
+
+
+ Sample name concordance check
+
+ Checks whether information in the metadata is concordant with that contained in the VCF files, in particular sample names.
+
+ ▶ ❌ Analysis A: Sample names in metadata do not match with those in VCF files
+
+
+
+ Category | First 5 Errors For Category | Link To View All Errors |
+
+
+ Samples described in the metadata but not in the VCF files |
+ SampleA1, SampleA2 , SampleA3, SampleA4, SampleA5 |
+ Show All Errors For Category |
+
+
+ Samples in the VCF files but not described in the metadata |
+ A1Sample , A2Sample, A3Sample, A4Sample, A5Sample |
+ Show All Errors For Category |
+
+
+
+
+
All Errors For Category - Samples in the VCF files but not described in the metadata:
+
+ -
+ A1Sample•
+
+ -
+ •A2Sample
+
+ -
+ A3Sample
+
+ -
+ A4Sample
+
+ -
+ A5Sample
+
+ -
+ A6Sample
+
+ -
+ A7Sample
+
+ -
+ A8Sample
+
+ -
+ A9Sample
+
+ -
+ A10Sample
+
+
+
Hide
+
+
+ ✔ Analysis B: Sample names in metadata match with those in VCF files
+ ▶ ❌ Analysis C: Sample names in metadata do not match with those in VCF files
+
+
+
+ Category | First 5 Errors For Category | Link To View All Errors |
+
+
+ Samples described in the metadata but not in the VCF files |
+ SampleC1 , SampleC2, SampleC3, SampleC4 |
+ Show All Errors For Category |
+
+
+ Samples in the VCF files but not described in the metadata |
+ C1Sample , C2Sample, C3Sample, C4Sample |
+ Show All Errors For Category |
+
+
+
+
+
All Errors For Category - Samples in the VCF files but not described in the metadata:
+
+ -
+ C1Sample•
+
+ -
+ •C2Sample
+
+ -
+ C3Sample
+
+ -
+ C4Sample
+
+
+
Hide
+
+
+
+
+ Reference genome INSDC check
+
+ Checks that the reference sequences in the FASTA file used to call the variants are accessioned in INSDC.
+ Also checks if the reference assembly accession in the metadata matches the one determined from the FASTA file.
+
+ metadata_asm_match.fa
+
+ ✔ All sequences are INSDC accessioned
+
+
+ ✔ Analysis A: Assembly accession in metadata is compatible
+ metadata_asm_not_found.fa
+
+ ✔ All sequences are INSDC accessioned
+
+
+ ▶ ❌ No assembly accession found in metadata
+
+
Full report: /path/to/metadata_asm_not_found.yml
+
+
+ Category | Accessions |
+
+
+ Assembly accession found in metadata |
+ Not found |
+
+
+ Assembly accession(s) compatible with FASTA |
+ GCA_1 |
+
+
+
+ metadata_asm_not_match.fa
+
+ ✔ All sequences are INSDC accessioned
+
+
+ ▶ ❌ Analysis B: Assembly accession in metadata is not compatible
+
+
Full report: /path/to/metadata_asm_not_match.yml
+
+
+ Category | Accessions |
+
+
+ Assembly accession found in metadata |
+ GCA_2 |
+
+
+ Assembly accession(s) compatible with FASTA |
+ GCA_1 |
+
+
+
+ metadata_error.fa
+
+ Warning: The following results may be incomplete due to problems with external services. Please try again later for
+ complete results.
+
Error message: 500 Server Error: Internal Server Error for url: https://www.ebi.ac.uk/eva/webservices/contig-alias/v1/chromosomes/md5checksum/hjfdoijsfc47hfg0gh9qwjrve
+
+
+ ✔ All sequences are INSDC accessioned
+
+
+ ✔ Analysis C: Assembly accession in metadata is compatible
+ not_all_insdc.fa
+
+ ▶ ❌ Some sequences are not INSDC accessioned
+
+
First 10 sequences not in INSDC. Full report: /path/to/not_all_insdc_check.yml
+
+
+ Sequence name | Refget md5 |
+
+
+ 2 | hjfdoijsfc47hfg0gh9qwjrve |
+
+
+
+
+
+ ✔ Analysis A: Assembly accession in metadata is compatible
+
+
+
+
\ No newline at end of file
diff --git a/tests/resources/validation_reports/expected_report_metadata_xlsx.html b/tests/resources/validation_reports/expected_report_metadata_xlsx.html
index fa9e51d..5b84d29 100644
--- a/tests/resources/validation_reports/expected_report_metadata_xlsx.html
+++ b/tests/resources/validation_reports/expected_report_metadata_xlsx.html
@@ -1,22 +1,485 @@
-Validation ReportProject Summary
General details about the project
Project Title: My cool project
Validation Date: 2023-08-31 12:34:56
Submission Directory: /test/submission/dir
▶ Files mapping
VCF File | Fasta File | Analysis |
---|
input_fail.vcf | input_fail.fa | A |
input_pass.vcf | input_pass.fa | B |
input_test.vcf | input_test.fa | could not be linked |
Metadata validation results
Ensures that required fields are present and values are formatted correctly. For requirements, please refer to the
EVA website.
▶ ❌ Metadata validation check
Full report: /path/to/metadata/metadata_spreadsheet_validation.txt
Sheet | Row | Column | Description |
---|
Files | | | Sheet "Files" is missing |
Project | 2 | Project Title | Column "Project Title" is not populated |
Project | 2 | Description | Column "Description" is not populated |
Project | 2 | Tax ID | Column "Tax ID" is not populated |
Project | 2 | Center | Column "Center" is not populated |
Analysis | 2 | Analysis Title | Column "Analysis Title" is not populated |
Analysis | 2 | Description | Column "Description" is not populated |
Analysis | 2 | Experiment Type | Column "Experiment Type" is not populated |
Analysis | 2 | Reference | Column "Reference" is not populated |
Sample | 3 | Sample Accession | Column "Sample Accession" is not populated |
VCF validation results
Checks whether each file is compliant with the
VCF specification. Also checks whether the variants' reference alleles match against the reference assembly.
input_fail.vcf
▶ ❌ Assembly check: 26/36 (72.22%)
First 10 errors per category are below. Full report: /path/to/assembly_failed/report
Category | Error |
---|
Parsing Error | The assembly checking could not be completed: Contig 'chr23' not found in assembly report |
mismatch error | Chromosome 1, position 35549, reference allele 'G' does not match the reference sequence, expected 'c' |
mismatch error | Chromosome 1, position 35595, reference allele 'G' does not match the reference sequence, expected 'a' |
mismatch error | Chromosome 1, position 35618, reference allele 'G' does not match the reference sequence, expected 'c' |
mismatch error | Chromosome 1, position 35626, reference allele 'A' does not match the reference sequence, expected 'g' |
mismatch error | Chromosome 1, position 35639, reference allele 'T' does not match the reference sequence, expected 'c' |
mismatch error | Chromosome 1, position 35643, reference allele 'T' does not match the reference sequence, expected 'g' |
mismatch error | Chromosome 1, position 35717, reference allele 'T' does not match the reference sequence, expected 'g' |
mismatch error | Chromosome 1, position 35819, reference allele 'T' does not match the reference sequence, expected 'a' |
mismatch error | Chromosome 1, position 35822, reference allele 'T' does not match the reference sequence, expected 'c' |
▶ ❌ VCF check: 1 critical errors, 1 non-critical errors
First 10 errors per category are below. Full report: /path/to/vcf_failed/report
Category | Error |
---|
critical error | Line 4: Error in meta-data section. |
non-critical error | Sample #11, field AD does not match the meta specification Number=R (expected 2 value(s)). AD=.. |
input_passed.vcf
✔ Assembly check: 247/247 (100.0%)
✔ VCF check: 0 critical errors, 0 non-critical errors
Sample name concordance check
Checks whether information in the metadata is concordant with that contained in the VCF files, in particular sample names.
▶ ❌ Analysis A: Sample names in metadata do not match with those in VCF files
Category | First 5 Errors For Category | Link To View All Errors |
---|
Samples described in the metadata but not in the VCF files | SampleA1, SampleA2 , SampleA3, SampleA4, SampleA5 | Show All Errors For Category |
Samples in the VCF files but not described in the metadata | A1Sample , A2Sample, A3Sample, A4Sample, A5Sample | Show All Errors For Category |
All Errors For Category - Samples in the VCF files but not described in the metadata:
- A1Sample•
- •A2Sample
- A3Sample
- A4Sample
- A5Sample
- A6Sample
- A7Sample
- A8Sample
- A9Sample
- A10Sample
Hide ✔ Analysis B: Sample names in metadata match with those in VCF files
▶ ❌ Analysis C: Sample names in metadata do not match with those in VCF files
Category | First 5 Errors For Category | Link To View All Errors |
---|
Samples described in the metadata but not in the VCF files | SampleC1 , SampleC2, SampleC3, SampleC4 | Show All Errors For Category |
Samples in the VCF files but not described in the metadata | C1Sample , C2Sample, C3Sample, C4Sample | Show All Errors For Category |
All Errors For Category - Samples in the VCF files but not described in the metadata:
- C1Sample•
- •C2Sample
- C3Sample
- C4Sample
HideReference genome INSDC check
Checks that the reference sequences in the FASTA file used to call the variants are accessioned in INSDC. Also checks if the reference assembly accession in the metadata matches the one determined from the FASTA file.
metadata_asm_match.fa
✔ All sequences are INSDC accessioned
✔ Analysis A: Assembly accession in metadata is compatible
metadata_asm_not_found.fa
✔ All sequences are INSDC accessioned
▶ ❌ No assembly accession found in metadata
Full report: /path/to/metadata_asm_not_found.yml
Category | Accessions |
---|
Assembly accession found in metadata | Not found |
Assembly accession(s) compatible with FASTA | GCA_1 |
metadata_asm_not_match.fa
✔ All sequences are INSDC accessioned
▶ ❌ Analysis B: Assembly accession in metadata is not compatible
Full report: /path/to/metadata_asm_not_match.yml
Category | Accessions |
---|
Assembly accession found in metadata | GCA_2 |
Assembly accession(s) compatible with FASTA | GCA_1 |
metadata_error.fa
Warning: The following results may be incomplete due to problems with external services. Please try again later for complete results.
Error message: 500 Server Error: Internal Server Error for url: https://www.ebi.ac.uk/eva/webservices/contig-alias/v1/chromosomes/md5checksum/hjfdoijsfc47hfg0gh9qwjrve
✔ All sequences are INSDC accessioned
✔ Analysis C: Assembly accession in metadata is compatible
not_all_insdc.fa
▶ ❌ Some sequences are not INSDC accessioned
First 10 sequences not in INSDC. Full report: /path/to/not_all_insdc_check.yml
Sequence name | Refget md5 |
---|
2 | hjfdoijsfc47hfg0gh9qwjrve |
✔ Analysis A: Assembly accession in metadata is compatible
\ No newline at end of file
+ .error-list, .no-show { display: none; }
+
+
+
+
+
+ Project Summary
+
+ General details about the project
+
+
+
Project Title: My cool project
+
Validation Date: 2023-08-31 12:34:56
+
Submission Directory: /test/submission/dir
+
+ ▶ Files mapping
+
+
+
+ VCF File |
+ Fasta File |
+ Analysis |
+
+
+ input_fail.vcf |
+ input_fail.fa |
+ A |
+
+
+ input_pass.vcf |
+ input_pass.fa |
+ B |
+
+
+ input_test.vcf |
+ input_test.fa |
+ could not be linked |
+
+
+
+
+
+
+ Metadata validation results
+
+ Ensures that required fields are present and values are formatted correctly.
+ For requirements, please refer to the
EVA website.
+
+ ▶ ❌ Metadata validation check
+
+
Full report: /path/to/metadata/metadata_spreadsheet_validation.txt
+
+
+ Sheet | Row | Column | Description |
+
+
+ Files |
+ |
+ |
+ Sheet "Files" is missing |
+
+
+ Project |
+ 2 |
+ Project Title |
+ Column "Project Title" is not populated |
+
+
+ Project |
+ 2 |
+ Description |
+ Column "Description" is not populated |
+
+
+ Project |
+ 2 |
+ Tax ID |
+ Column "Tax ID" is not populated |
+
+
+ Project |
+ 2 |
+ Center |
+ Column "Center" is not populated |
+
+
+ Analysis |
+ 2 |
+ Analysis Title |
+ Column "Analysis Title" is not populated |
+
+
+ Analysis |
+ 2 |
+ Description |
+ Column "Description" is not populated |
+
+
+ Analysis |
+ 2 |
+ Experiment Type |
+ Column "Experiment Type" is not populated |
+
+
+ Analysis |
+ 2 |
+ Reference |
+ Column "Reference" is not populated |
+
+
+ Sample |
+ 3 |
+ Sample Accession |
+ Column "Sample Accession" is not populated |
+
+
+
+
+
+ VCF validation results
+
+ Checks whether each file is compliant with the
VCF specification.
+ Also checks whether the variants' reference alleles match against the reference assembly.
+
+ input_fail.vcf
+ ▶ ❌ Assembly check: 26/36 (72.22%)
+
+
First 10 errors per category are below. Full report: /path/to/assembly_failed/report
+
+
+ Category | Error |
+
+
+ Parsing Error | The assembly checking could not be completed: Contig 'chr23' not found in assembly report |
+
+
+ mismatch error | Chromosome 1, position 35549, reference allele 'G' does not match the reference sequence, expected 'c' |
+
+
+ mismatch error | Chromosome 1, position 35595, reference allele 'G' does not match the reference sequence, expected 'a' |
+
+
+ mismatch error | Chromosome 1, position 35618, reference allele 'G' does not match the reference sequence, expected 'c' |
+
+
+ mismatch error | Chromosome 1, position 35626, reference allele 'A' does not match the reference sequence, expected 'g' |
+
+
+ mismatch error | Chromosome 1, position 35639, reference allele 'T' does not match the reference sequence, expected 'c' |
+
+
+ mismatch error | Chromosome 1, position 35643, reference allele 'T' does not match the reference sequence, expected 'g' |
+
+
+ mismatch error | Chromosome 1, position 35717, reference allele 'T' does not match the reference sequence, expected 'g' |
+
+
+ mismatch error | Chromosome 1, position 35819, reference allele 'T' does not match the reference sequence, expected 'a' |
+
+
+ mismatch error | Chromosome 1, position 35822, reference allele 'T' does not match the reference sequence, expected 'c' |
+
+
+
+ ▶ ❌ VCF check: 1 critical errors, 1 non-critical errors
+
+
First 10 errors per category are below. Full report: /path/to/vcf_failed/report
+
+
+ Category | Error |
+
+
+ critical error | Line 4: Error in meta-data section. |
+
+
+ non-critical error | Sample #11, field AD does not match the meta specification Number=R (expected 2 value(s)). AD=.. |
+
+
+
+ input_passed.vcf
+ ✔ Assembly check: 247/247 (100.0%)
+ ✔ VCF check: 0 critical errors, 0 non-critical errors
+
+
+ Sample name concordance check
+
+ Checks whether information in the metadata is concordant with that contained in the VCF files, in particular sample names.
+
+ ▶ ❌ Analysis A: Sample names in metadata do not match with those in VCF files
+
+
+
+ Category | First 5 Errors For Category | Link To View All Errors |
+
+
+ Samples described in the metadata but not in the VCF files |
+ SampleA1, SampleA2 , SampleA3, SampleA4, SampleA5 |
+ Show All Errors For Category |
+
+
+ Samples in the VCF files but not described in the metadata |
+ A1Sample , A2Sample, A3Sample, A4Sample, A5Sample |
+ Show All Errors For Category |
+
+
+
+
+
All Errors For Category - Samples in the VCF files but not described in the metadata:
+
+ -
+ A1Sample•
+
+ -
+ •A2Sample
+
+ -
+ A3Sample
+
+ -
+ A4Sample
+
+ -
+ A5Sample
+
+ -
+ A6Sample
+
+ -
+ A7Sample
+
+ -
+ A8Sample
+
+ -
+ A9Sample
+
+ -
+ A10Sample
+
+
+
Hide
+
+
+ ✔ Analysis B: Sample names in metadata match with those in VCF files
+ ▶ ❌ Analysis C: Sample names in metadata do not match with those in VCF files
+
+
+
+ Category | First 5 Errors For Category | Link To View All Errors |
+
+
+ Samples described in the metadata but not in the VCF files |
+ SampleC1 , SampleC2, SampleC3, SampleC4 |
+ Show All Errors For Category |
+
+
+ Samples in the VCF files but not described in the metadata |
+ C1Sample , C2Sample, C3Sample, C4Sample |
+ Show All Errors For Category |
+
+
+
+
+
All Errors For Category - Samples in the VCF files but not described in the metadata:
+
+ -
+ C1Sample•
+
+ -
+ •C2Sample
+
+ -
+ C3Sample
+
+ -
+ C4Sample
+
+
+
Hide
+
+
+
+
+ Reference genome INSDC check
+
+ Checks that the reference sequences in the FASTA file used to call the variants are accessioned in INSDC.
+ Also checks if the reference assembly accession in the metadata matches the one determined from the FASTA file.
+
+ metadata_asm_match.fa
+
+ ✔ All sequences are INSDC accessioned
+
+
+ ✔ Analysis A: Assembly accession in metadata is compatible
+ metadata_asm_not_found.fa
+
+ ✔ All sequences are INSDC accessioned
+
+
+ ▶ ❌ No assembly accession found in metadata
+
+
Full report: /path/to/metadata_asm_not_found.yml
+
+
+ Category | Accessions |
+
+
+ Assembly accession found in metadata |
+ Not found |
+
+
+ Assembly accession(s) compatible with FASTA |
+ GCA_1 |
+
+
+
+ metadata_asm_not_match.fa
+
+ ✔ All sequences are INSDC accessioned
+
+
+ ▶ ❌ Analysis B: Assembly accession in metadata is not compatible
+
+
Full report: /path/to/metadata_asm_not_match.yml
+
+
+ Category | Accessions |
+
+
+ Assembly accession found in metadata |
+ GCA_2 |
+
+
+ Assembly accession(s) compatible with FASTA |
+ GCA_1 |
+
+
+
+ metadata_error.fa
+
+ Warning: The following results may be incomplete due to problems with external services. Please try again later for
+ complete results.
+
Error message: 500 Server Error: Internal Server Error for url: https://www.ebi.ac.uk/eva/webservices/contig-alias/v1/chromosomes/md5checksum/hjfdoijsfc47hfg0gh9qwjrve
+
+
+ ✔ All sequences are INSDC accessioned
+
+
+ ✔ Analysis C: Assembly accession in metadata is compatible
+ not_all_insdc.fa
+
+ ▶ ❌ Some sequences are not INSDC accessioned
+
+
First 10 sequences not in INSDC. Full report: /path/to/not_all_insdc_check.yml
+
+
+ Sequence name | Refget md5 |
+
+
+ 2 | hjfdoijsfc47hfg0gh9qwjrve |
+
+
+
+
+
+ ✔ Analysis A: Assembly accession in metadata is compatible
+
+
+
+
\ No newline at end of file
diff --git a/tests/resources/validation_reports/expected_shallow_metadata_xlsx_report.html b/tests/resources/validation_reports/expected_shallow_metadata_xlsx_report.html
index 5fc4e3a..fd81a86 100644
--- a/tests/resources/validation_reports/expected_shallow_metadata_xlsx_report.html
+++ b/tests/resources/validation_reports/expected_shallow_metadata_xlsx_report.html
@@ -1,22 +1,509 @@
-Validation Report▶ ❌ You requested to run the shallow validation, please run full validation before submitting the data
VCF File | Variant lines validated in VCF | Entries used in Fasta |
---|
input_fail.vcf | 10000 | 24 |
input_passed.vcf | 10000 | 24 |
Project Summary
General details about the project
Project Title: My cool project
Validation Date: 2023-08-31 12:34:56
Submission Directory: /test/submission/dir
▶ Files mapping
VCF File | Fasta File | Analysis |
---|
input_fail.vcf | input_fail.fa | A |
input_pass.vcf | input_pass.fa | B |
input_test.vcf | input_test.fa | could not be linked |
Metadata validation results
Ensures that required fields are present and values are formatted correctly. For requirements, please refer to the
EVA website.
▶ ❌ Metadata validation check
Full report: /path/to/metadata/metadata_spreadsheet_validation.txt
Sheet | Row | Column | Description |
---|
Files | | | Sheet "Files" is missing |
Project | 2 | Project Title | Column "Project Title" is not populated |
Project | 2 | Description | Column "Description" is not populated |
Project | 2 | Tax ID | Column "Tax ID" is not populated |
Project | 2 | Center | Column "Center" is not populated |
Analysis | 2 | Analysis Title | Column "Analysis Title" is not populated |
Analysis | 2 | Description | Column "Description" is not populated |
Analysis | 2 | Experiment Type | Column "Experiment Type" is not populated |
Analysis | 2 | Reference | Column "Reference" is not populated |
Sample | 3 | Sample Accession | Column "Sample Accession" is not populated |
VCF validation results
Checks whether each file is compliant with the
VCF specification. Also checks whether the variants' reference alleles match against the reference assembly.
input_fail.vcf
▶ ❌ Assembly check: 26/36 (72.22%)
First 10 errors per category are below. Full report: /path/to/assembly_failed/report
Category | Error |
---|
Parsing Error | The assembly checking could not be completed: Contig 'chr23' not found in assembly report |
mismatch error | Chromosome 1, position 35549, reference allele 'G' does not match the reference sequence, expected 'c' |
mismatch error | Chromosome 1, position 35595, reference allele 'G' does not match the reference sequence, expected 'a' |
mismatch error | Chromosome 1, position 35618, reference allele 'G' does not match the reference sequence, expected 'c' |
mismatch error | Chromosome 1, position 35626, reference allele 'A' does not match the reference sequence, expected 'g' |
mismatch error | Chromosome 1, position 35639, reference allele 'T' does not match the reference sequence, expected 'c' |
mismatch error | Chromosome 1, position 35643, reference allele 'T' does not match the reference sequence, expected 'g' |
mismatch error | Chromosome 1, position 35717, reference allele 'T' does not match the reference sequence, expected 'g' |
mismatch error | Chromosome 1, position 35819, reference allele 'T' does not match the reference sequence, expected 'a' |
mismatch error | Chromosome 1, position 35822, reference allele 'T' does not match the reference sequence, expected 'c' |
▶ ❌ VCF check: 1 critical errors, 1 non-critical errors
First 10 errors per category are below. Full report: /path/to/vcf_failed/report
Category | Error |
---|
critical error | Line 4: Error in meta-data section. |
non-critical error | Sample #11, field AD does not match the meta specification Number=R (expected 2 value(s)). AD=.. |
input_passed.vcf
✔ Assembly check: 247/247 (100.0%)
✔ VCF check: 0 critical errors, 0 non-critical errors
Sample name concordance check
Checks whether information in the metadata is concordant with that contained in the VCF files, in particular sample names.
▶ ❌ Analysis A: Sample names in metadata do not match with those in VCF files
Category | First 5 Errors For Category | Link To View All Errors |
---|
Samples described in the metadata but not in the VCF files | SampleA1, SampleA2 , SampleA3, SampleA4, SampleA5 | Show All Errors For Category |
Samples in the VCF files but not described in the metadata | A1Sample , A2Sample, A3Sample, A4Sample, A5Sample | Show All Errors For Category |
All Errors For Category - Samples in the VCF files but not described in the metadata:
- A1Sample•
- •A2Sample
- A3Sample
- A4Sample
- A5Sample
- A6Sample
- A7Sample
- A8Sample
- A9Sample
- A10Sample
Hide ✔ Analysis B: Sample names in metadata match with those in VCF files
▶ ❌ Analysis C: Sample names in metadata do not match with those in VCF files
Category | First 5 Errors For Category | Link To View All Errors |
---|
Samples described in the metadata but not in the VCF files | SampleC1 , SampleC2, SampleC3, SampleC4 | Show All Errors For Category |
Samples in the VCF files but not described in the metadata | C1Sample , C2Sample, C3Sample, C4Sample | Show All Errors For Category |
All Errors For Category - Samples in the VCF files but not described in the metadata:
- C1Sample•
- •C2Sample
- C3Sample
- C4Sample
HideReference genome INSDC check
Checks that the reference sequences in the FASTA file used to call the variants are accessioned in INSDC. Also checks if the reference assembly accession in the metadata matches the one determined from the FASTA file.
metadata_asm_match.fa
✔ All sequences are INSDC accessioned
✔ Analysis A: Assembly accession in metadata is compatible
metadata_asm_not_found.fa
✔ All sequences are INSDC accessioned
▶ ❌ No assembly accession found in metadata
Full report: /path/to/metadata_asm_not_found.yml
Category | Accessions |
---|
Assembly accession found in metadata | Not found |
Assembly accession(s) compatible with FASTA | GCA_1 |
metadata_asm_not_match.fa
✔ All sequences are INSDC accessioned
▶ ❌ Analysis B: Assembly accession in metadata is not compatible
Full report: /path/to/metadata_asm_not_match.yml
Category | Accessions |
---|
Assembly accession found in metadata | GCA_2 |
Assembly accession(s) compatible with FASTA | GCA_1 |
metadata_error.fa
Warning: The following results may be incomplete due to problems with external services. Please try again later for complete results.
Error message: 500 Server Error: Internal Server Error for url: https://www.ebi.ac.uk/eva/webservices/contig-alias/v1/chromosomes/md5checksum/hjfdoijsfc47hfg0gh9qwjrve
✔ All sequences are INSDC accessioned
✔ Analysis C: Assembly accession in metadata is compatible
not_all_insdc.fa
▶ ❌ Some sequences are not INSDC accessioned
First 10 sequences not in INSDC. Full report: /path/to/not_all_insdc_check.yml
Sequence name | Refget md5 |
---|
2 | hjfdoijsfc47hfg0gh9qwjrve |
✔ Analysis A: Assembly accession in metadata is compatible
\ No newline at end of file
+ .error-list, .no-show { display: none; }
+
+
+
+
+
+ ▶
+ ❌ You requested to run the shallow validation, please run full validation before submitting the data
+
+
+
+
+ VCF File |
+ Variant lines validated in VCF |
+ Entries used in Fasta |
+
+
+ input_fail.vcf |
+ 10000 |
+ 24 |
+
+
+ input_passed.vcf |
+ 10000 |
+ 24 |
+
+
+
+
+
+ Project Summary
+
+ General details about the project
+
+
+
Project Title: My cool project
+
Validation Date: 2023-08-31 12:34:56
+
Submission Directory: /test/submission/dir
+
+ ▶ Files mapping
+
+
+
+ VCF File |
+ Fasta File |
+ Analysis |
+
+
+ input_fail.vcf |
+ input_fail.fa |
+ A |
+
+
+ input_pass.vcf |
+ input_pass.fa |
+ B |
+
+
+ input_test.vcf |
+ input_test.fa |
+ could not be linked |
+
+
+
+
+
+
+ Metadata validation results
+
+ Ensures that required fields are present and values are formatted correctly.
+ For requirements, please refer to the
EVA website.
+
+ ▶ ❌ Metadata validation check
+
+
Full report: /path/to/metadata/metadata_spreadsheet_validation.txt
+
+
+ Sheet | Row | Column | Description |
+
+
+ Files |
+ |
+ |
+ Sheet "Files" is missing |
+
+
+ Project |
+ 2 |
+ Project Title |
+ Column "Project Title" is not populated |
+
+
+ Project |
+ 2 |
+ Description |
+ Column "Description" is not populated |
+
+
+ Project |
+ 2 |
+ Tax ID |
+ Column "Tax ID" is not populated |
+
+
+ Project |
+ 2 |
+ Center |
+ Column "Center" is not populated |
+
+
+ Analysis |
+ 2 |
+ Analysis Title |
+ Column "Analysis Title" is not populated |
+
+
+ Analysis |
+ 2 |
+ Description |
+ Column "Description" is not populated |
+
+
+ Analysis |
+ 2 |
+ Experiment Type |
+ Column "Experiment Type" is not populated |
+
+
+ Analysis |
+ 2 |
+ Reference |
+ Column "Reference" is not populated |
+
+
+ Sample |
+ 3 |
+ Sample Accession |
+ Column "Sample Accession" is not populated |
+
+
+
+
+
+ VCF validation results
+
+ Checks whether each file is compliant with the
VCF specification.
+ Also checks whether the variants' reference alleles match against the reference assembly.
+
+ input_fail.vcf
+ ▶ ❌ Assembly check: 26/36 (72.22%)
+
+
First 10 errors per category are below. Full report: /path/to/assembly_failed/report
+
+
+ Category | Error |
+
+
+ Parsing Error | The assembly checking could not be completed: Contig 'chr23' not found in assembly report |
+
+
+ mismatch error | Chromosome 1, position 35549, reference allele 'G' does not match the reference sequence, expected 'c' |
+
+
+ mismatch error | Chromosome 1, position 35595, reference allele 'G' does not match the reference sequence, expected 'a' |
+
+
+ mismatch error | Chromosome 1, position 35618, reference allele 'G' does not match the reference sequence, expected 'c' |
+
+
+ mismatch error | Chromosome 1, position 35626, reference allele 'A' does not match the reference sequence, expected 'g' |
+
+
+ mismatch error | Chromosome 1, position 35639, reference allele 'T' does not match the reference sequence, expected 'c' |
+
+
+ mismatch error | Chromosome 1, position 35643, reference allele 'T' does not match the reference sequence, expected 'g' |
+
+
+ mismatch error | Chromosome 1, position 35717, reference allele 'T' does not match the reference sequence, expected 'g' |
+
+
+ mismatch error | Chromosome 1, position 35819, reference allele 'T' does not match the reference sequence, expected 'a' |
+
+
+ mismatch error | Chromosome 1, position 35822, reference allele 'T' does not match the reference sequence, expected 'c' |
+
+
+
+ ▶ ❌ VCF check: 1 critical errors, 1 non-critical errors
+
+
First 10 errors per category are below. Full report: /path/to/vcf_failed/report
+
+
+ Category | Error |
+
+
+ critical error | Line 4: Error in meta-data section. |
+
+
+ non-critical error | Sample #11, field AD does not match the meta specification Number=R (expected 2 value(s)). AD=.. |
+
+
+
+ input_passed.vcf
+ ✔ Assembly check: 247/247 (100.0%)
+ ✔ VCF check: 0 critical errors, 0 non-critical errors
+
+
+ Sample name concordance check
+
+ Checks whether information in the metadata is concordant with that contained in the VCF files, in particular sample names.
+
+ ▶ ❌ Analysis A: Sample names in metadata do not match with those in VCF files
+
+
+
+ Category | First 5 Errors For Category | Link To View All Errors |
+
+
+ Samples described in the metadata but not in the VCF files |
+ SampleA1, SampleA2 , SampleA3, SampleA4, SampleA5 |
+ Show All Errors For Category |
+
+
+ Samples in the VCF files but not described in the metadata |
+ A1Sample , A2Sample, A3Sample, A4Sample, A5Sample |
+ Show All Errors For Category |
+
+
+
+
+
All Errors For Category - Samples in the VCF files but not described in the metadata:
+
+ -
+ A1Sample•
+
+ -
+ •A2Sample
+
+ -
+ A3Sample
+
+ -
+ A4Sample
+
+ -
+ A5Sample
+
+ -
+ A6Sample
+
+ -
+ A7Sample
+
+ -
+ A8Sample
+
+ -
+ A9Sample
+
+ -
+ A10Sample
+
+
+
Hide
+
+
+ ✔ Analysis B: Sample names in metadata match with those in VCF files
+ ▶ ❌ Analysis C: Sample names in metadata do not match with those in VCF files
+
+
+
+ Category | First 5 Errors For Category | Link To View All Errors |
+
+
+ Samples described in the metadata but not in the VCF files |
+ SampleC1 , SampleC2, SampleC3, SampleC4 |
+ Show All Errors For Category |
+
+
+ Samples in the VCF files but not described in the metadata |
+ C1Sample , C2Sample, C3Sample, C4Sample |
+ Show All Errors For Category |
+
+
+
+
+
All Errors For Category - Samples in the VCF files but not described in the metadata:
+
+ -
+ C1Sample•
+
+ -
+ •C2Sample
+
+ -
+ C3Sample
+
+ -
+ C4Sample
+
+
+
Hide
+
+
+
+
+ Reference genome INSDC check
+
+ Checks that the reference sequences in the FASTA file used to call the variants are accessioned in INSDC.
+ Also checks if the reference assembly accession in the metadata matches the one determined from the FASTA file.
+
+ metadata_asm_match.fa
+
+ ✔ All sequences are INSDC accessioned
+
+
+ ✔ Analysis A: Assembly accession in metadata is compatible
+ metadata_asm_not_found.fa
+
+ ✔ All sequences are INSDC accessioned
+
+
+ ▶ ❌ No assembly accession found in metadata
+
+
Full report: /path/to/metadata_asm_not_found.yml
+
+
+ Category | Accessions |
+
+
+ Assembly accession found in metadata |
+ Not found |
+
+
+ Assembly accession(s) compatible with FASTA |
+ GCA_1 |
+
+
+
+ metadata_asm_not_match.fa
+
+ ✔ All sequences are INSDC accessioned
+
+
+ ▶ ❌ Analysis B: Assembly accession in metadata is not compatible
+
+
Full report: /path/to/metadata_asm_not_match.yml
+
+
+ Category | Accessions |
+
+
+ Assembly accession found in metadata |
+ GCA_2 |
+
+
+ Assembly accession(s) compatible with FASTA |
+ GCA_1 |
+
+
+
+ metadata_error.fa
+
+ Warning: The following results may be incomplete due to problems with external services. Please try again later for
+ complete results.
+
Error message: 500 Server Error: Internal Server Error for url: https://www.ebi.ac.uk/eva/webservices/contig-alias/v1/chromosomes/md5checksum/hjfdoijsfc47hfg0gh9qwjrve
+
+
+ ✔ All sequences are INSDC accessioned
+
+
+ ✔ Analysis C: Assembly accession in metadata is compatible
+ not_all_insdc.fa
+
+ ▶ ❌ Some sequences are not INSDC accessioned
+
+
First 10 sequences not in INSDC. Full report: /path/to/not_all_insdc_check.yml
+
+
+ Sequence name | Refget md5 |
+
+
+ 2 | hjfdoijsfc47hfg0gh9qwjrve |
+
+
+
+
+
+ ✔ Analysis A: Assembly accession in metadata is compatible
+
+
+
+
\ No newline at end of file