Replies: 1 comment 5 replies
-
So seqr does not support Pisces as a caller, we really only support GATK and DRAGEN. IF you do want to try to load this VCF, you can try running it with the Skipping validation is not supported in the helper scripts we provide for running the pipeline, so you need to manually format the command. You can see and example of how to do that here: #3207 (comment) |
Beta Was this translation helpful? Give feedback.
-
Hi Team,
We were trying to import our VCF into our local instance and while trying to do so, we are getting an error stating "SeqrValidationError: Genome version validation error: dataset specified as GRCh37 but doesn't contain the expected number of common GRCh37 variants."
As you can see in the screenshot, We have used Pisces 5.2.11.63 and the reference genome we have used is hg19.
We tried to load the VCF using "docker-compose exec pipeline-runner load_data.sh $BUILD_VERSION $SAMPLE_TYPE $INDEX_NAME $INPUT_FILE_PATH" and is encountered with the error as shown below.
We also tried to add the --dont-validate parameter and we are still getting the same error.
docker-compose exec pipeline-runner python3 -m seqr_loading SeqrMTToESTask --local-scheduler \
--reference-ht-path "/seqr-reference-data/GRCh${BUILD_VERSION}/combined_reference_data_grch${BUILD_VERSION}.ht" \
--clinvar-ht-path "/seqr-reference-data/GRCh${BUILD_VERSION}/clinvar.GRCh${BUILD_VERSION}.ht" \
--vep-config-json-path "/vep_configs/vep-GRCh${BUILD_VERSION}-loftee.json" \
--es-host elasticsearch \
--es-index-min-num-shards 1 \
--sample-type "${SAMPLE_TYPE}" \
--es-index "${INDEX_NAME}" \
--genome-version "${BUILD_VERSION}" \
--source-paths "/input_vcfs/${INPUT_FILE_PATH}" \
--dest-path "/input_vcfs/${INPUT_FILE_PATH/.*/}.mt"
--dont-validate
LOGGING: writing to /hail-20240118-0929-0.2.122-be9d88a80695.log
{'_Task__hash': 8127482535013454491,
'clinvar_ht_path': '/seqr-reference-data/GRCh37/clinvar.GRCh37.ht',
'dataset_type': 'VARIANTS',
'decrease_running_resources': <bound method TaskStatusReporter.decrease_running_resources of <luigi.worker.TaskStatusReporter object at 0x7f0
8b671d450>>,
'dest_path': '/input_vcfs/GRCh37/STRAN-2020-21450.mt',
'dont_validate': False,
'genome_version': '37',
'grch38_to_grch37_ref_chain': 'gs://hail-common/references/grch38_to_grch37.over.chain.gz',
'hail_temp_dir': None,
'hgmd_ht_path': None,
'ignore_missing_samples_when_remapping': False,
'ignore_missing_samples_when_subsetting': False,
'interval_ref_ht_path': None,
'param_kwargs': {'clinvar_ht_path': '/seqr-reference-data/GRCh37/clinvar.GRCh37.ht',
'dataset_type': 'VARIANTS',
'dest_path': '/input_vcfs/GRCh37/STRAN-2020-21450.mt',
'dont_validate': False,
'genome_version': '37',
'grch38_to_grch37_ref_chain': 'gs://hail-common/references/grch38_to_grch37.over.chain.gz',
'hail_temp_dir': None,
'hgmd_ht_path': None,
'ignore_missing_samples_when_remapping': False,
'ignore_missing_samples_when_subsetting': False,
'interval_ref_ht_path': None,
'reference_ht_path': '/seqr-reference-data/GRCh37/combined_reference_data_grch37.ht',
'remap_path': None,
'sample_type': 'WES',
'source_paths': '/input_vcfs/GRCh37/STRAN-2020-21450.vcf.gz',
'subset_path': None,
'vep_config_json_path': '/vep_configs/vep-GRCh37-loftee.json',
'reference_ht_path': '/seqr-reference-data/GRCh37/combined_reference_data_grch37.ht',
'remap_path': None,
'sample_type': 'WES',
'scheduler_messages': None,
'set_progress_percentage': <bound method TaskStatusReporter.update_progress_percentage of <luigi.worker.TaskStatusReporter object at 0x7f08b6
71d450>>,
'set_status_message': <bound method TaskStatusReporter.update_status_message of <luigi.worker.TaskStatusReporter object at 0x7f08b671d450>>,
'set_tracking_url': <bound method TaskStatusReporter.update_tracking_url of <luigi.worker.TaskStatusReporter object at 0x7f08b671d450>>,
'source_paths': ['/input_vcfs/GRCh37/STRAN-2020-21450.vcf.gz'],
'subset_path': None,
'task_id': 'SeqrVCFToMTTask__seqr_reference__VARIANTS__input_vcfs_GRCh_a332994e88',
'vep_config_json_path': '/vep_configs/vep-GRCh37-loftee.json',
'vep_runner': 'VEP'}
2024-01-18 09:29:11.452 Hail: INFO: scanning VCF for sortedness...
2024-01-18 09:29:21.658 Hail: INFO: Coerced sorted VCF - no additional import work to do
ERROR: [pid 33790] Worker Worker(salt=7834400869, workers=1, host=98fd7f59ccec, username=root, pid=33790) failed SeqrVCFToMTTask(source_pat
hs=/input_vcfs/GRCh37/STRAN-2020-21450.vcf.gz, dest_path=/input_vcfs/GRCh37/STRAN-2020-21450.mt, genome_version=37, vep_runner=VEP, ignore_mis
sing_samples_when_remapping=False, ignore_missing_samples_when_subsetting=False, reference_ht_path=/seqr-reference-data/GRCh37/combined_refere
nce_data_grch37.ht, interval_ref_ht_path=, clinvar_ht_path=/seqr-reference-data/GRCh37/clinvar.GRCh37.ht, hgmd_ht_path=, sample_type=WES, dont
validate=False, dataset_type=VARIANTS, remap_path=, subset_path=, vep_config_json_path=/vep_configs/vep-GRCh37-loftee.json, grch38_to_grch37
ref_chain=gs://hail-common/references/grch38_to_grch37.over.chain.gz, hail_temp_dir=)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/luigi/worker.py", line 203, in run
new_deps = self._run_get_new_deps()
File "/usr/local/lib/python3.10/site-packages/luigi/worker.py", line 138, in _run_get_new_deps
task_gen = self.task.run()
File "/seqr-loading-pipelines/luigi_pipeline/seqr_loading.py", line 88, in run
self.read_input_write_mt()
File "/seqr-loading-pipelines/luigi_pipeline/seqr_loading.py", line 130, in read_input_write_mt
self.validate_mt(mt, self.genome_version, self.sample_type)
File "/seqr-loading-pipelines/luigi_pipeline/seqr_loading.py", line 222, in validate_mt
raise SeqrValidationError(
SeqrValidationError: Genome version validation error: dataset specified as GRCh37 but doesn't contain the expected number of common GRCh37 var
iants
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task SeqrVCFToMTTask__seqr_reference__VARIANTS__input_vcfs_GRCh_a332994e88 has status FAILED
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: There are 2 pending tasks possibly being run by other workers
DEBUG: There are 2 pending tasks unique to this worker
DEBUG: There are 2 pending tasks last scheduled by this worker
INFO: Worker Worker(salt=7834400869, workers=1, host=98fd7f59ccec, username=root, pid=33790) was stopped. Shutting down Keep-Alive thread
INFO:
===== Luigi Execution Summary =====
Scheduled 3 tasks of which:
This progress looks :( because there were failed tasks
===== Luigi Execution Summary =====
--dont-validate: command not found
Could you please look into this and let me know on how to solve this issue?
Regards,
Tulasi
Beta Was this translation helpful? Give feedback.
All reactions