Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of samples with too few reads (or no reads at all) #463

Open
jfr019 opened this issue May 28, 2024 · 1 comment
Open

Handling of samples with too few reads (or no reads at all) #463

jfr019 opened this issue May 28, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@jfr019
Copy link

jfr019 commented May 28, 2024

Is your feature request related to a problem? Please describe

The pipeline stops when analysing a batch of samples has a sample with very few or no reads. This causes a problem when starting the pipeline automatically, without inspecting the input data, as soon as the sequencer has finished.

Some rules and programs can handle missing input while others can't.

Error messages came from the following rules:

Error in rule fusions_fuseq_wes:
Error in rule cnv_sv_gatk_denoise_read_counts:
Error in rule cnv_sv_manta_run_workflow_t:
Error in rule cnv_sv_purecn_coverage:
Error in rule biomarker_cnvkit2scarhrd:
Error in rule cnv_sv_cnvkit_vcf:
Error in rule annotation_vep:
Error in rule annotation_vep_wo_pick:

Describe the solution you'd like

All rules should handle missing input without the pipeline stopping.

Describe alternatives you've considered

Add error handling for programs that exit with an error exit status.

Additional context

@jfr019 jfr019 added the enhancement New feature or request label May 28, 2024
@maehler
Copy link
Contributor

maehler commented Jun 19, 2024

What I have been doing is to run the pipeline with the snakemake flag --keep-going. Then all branches of the DAG that don't result in an error can continue all the way to the end, and ideally only the affected sample(s) will have missing files.

Perhaps we could have some kind of output, like a simple tsv/yaml/json, with the status of each sample after completion of the pipeline. That would make it easier and less error prone to figure out which sample(s) failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants