Error in rule polish_clusters (both with the test data and with real data) #13

scogi · 2021-10-25T07:48:07Z

Hello,
I installed the pipeline, which seems to have worked fine, but when running the pipeline test as recommended

# To test if the installation was successful run $ snakemake -j 1 -pr --configfile config.yml

I get an error at step 6 (might be as in #5 ). The same error occurs also with real data. Would be very grateful for any help. Thanks!

$ snakemake -j 1 -pr --configfile config.yml
Targets: EGFR_917
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads

cluster 1 1 1
cluster_consensus 1 1 1
copy_bed 1 1 1
detect_umi_consensus_fasta 1 1 1
detect_umi_fasta 1 1 1
map_1d 1 1 1
map_consensus 2 1 1
polish_clusters 1 1 1
reads 1 1 1
reformat_consensus_clusters 1 1 1
reformat_filter_clusters 1 1 1
seqkit_bam_acc_tsv 1 1 1
split_reads 1 1 1
total 14 1 1

Select jobs to execute...

[Mon Oct 25 09:11:15 2021]
rule copy_bed:
input: data/example_egfr_amplicon.bed
output: example_egfr_single_read_run/targets.bed
jobid: 1
reason: Missing output files: example_egfr_single_read_run/targets.bed
wildcards: name=example_egfr_single_read_run
resources: tmpdir=/tmp

cp data/example_egfr_amplicon.bed example_egfr_single_read_run/targets.bed
[Mon Oct 25 09:11:15 2021]
Finished job 1.
1 of 14 steps (7%) done
Select jobs to execute...

[Mon Oct 25 09:11:15 2021]
rule map_1d:
input: data/example_egfr_single_cluster.fastq, data/example_egfr_reference.fasta
output: example_egfr_single_read_run/align/1d.bam, example_egfr_single_read_run/align/1d.bam.bai
jobid: 11
reason: Missing output files: example_egfr_single_read_run/align/1d.bam
wildcards: name=example_egfr_single_read_run
resources: tmpdir=/tmp

catfishq --max_n 0 data/example_egfr_single_cluster.fastq | minimap2 -ax map-ont -k 13 -t 1 data/example_egfr_reference.fasta - | samtools sort -@ 5 -o example_egfr_single_read_run/align/1d.bam - && samtools index -@ 1 example_egfr_single_read_run/align/1d.bam
[M::mm_idx_gen::0.0061.09] collected minimizers
[M::mm_idx_gen::0.0131.04] sorted minimizers
[M::main::0.0131.04] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.0131.04] mid_occ = 13
[M::mm_idx_stat] kmer size: 13; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.0141.04] distinct minimizers: 35388 (98.51% are singletons); average occurrences: 1.028; average spacing: 5.326; total length: 193845
[M::worker_pipeline::0.0860.42] mapped 50 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -ax map-ont -k 13 -t 1 data/example_egfr_reference.fasta -
[M::main] Real time: 0.088 sec; CPU: 0.039 sec; Peak RSS: 0.008 GB
[Mon Oct 25 09:11:15 2021]
Finished job 11.
2 of 14 steps (14%) done
Select jobs to execute...

[Mon Oct 25 09:11:15 2021]
rule split_reads:
input: example_egfr_single_read_run/align/1d.bam
output: example_egfr_single_read_run/fasta_filtered, example_egfr_single_read_run/stats/umi_filter_reads_stats.txt
jobid: 10
reason: Missing output files: example_egfr_single_read_run/fasta_filtered; Input files updated by another job: example_egfr_single_read_run/align/1d.bam
wildcards: name=example_egfr_single_read_run
resources: tmpdir=/tmp

    mkdir -p example_egfr_single_read_run/fasta_filtered
    umi_filter_reads --min_overlap 0.9 -o example_egfr_single_read_run/fasta_filtered data/example_egfr_amplicon.bed example_egfr_single_read_run/align/1d.bam 2>&1 | tee example_egfr_single_read_run/stats/umi_filter_reads_stats.txt

Reads found: 50
Reads unmapped: 0 (0%)
EGFR_917
Reads found: 50
On target: 50 (100%)
0 concatamers - 0%
0 short - 0%
[Mon Oct 25 09:11:15 2021]
Finished job 10.
3 of 14 steps (21%) done
Select jobs to execute...

[Mon Oct 25 09:11:15 2021]
rule detect_umi_fasta:
input: example_egfr_single_read_run/fasta_filtered
output: example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta
jobid: 9
reason: Missing output files: example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta; Input files updated by another job: example_egfr_single_read_run/fasta_filtered
wildcards: name=example_egfr_single_read_run, target=EGFR_917
resources: tmpdir=/tmp

    umi_extract --fwd-context GTATCGTGTAGAGACTGCGTAGG --rev-context AGTGATCGAGTCAGTGCGAGTG --fwd-umi TTTVVVVTTVVVVTTVVVVTTVVVVTTT --rev-umi AAABBBBAABBBBAABBBBAABBBBAAA --max-error 3 example_egfr_single_read_run/fasta_filtered/EGFR_917.fastq -o example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta --tsv example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta.tsv

Counting reads in example_egfr_single_read_run/fasta_filtered/EGFR_917.fastq
100%|█████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 17395.09it/s]
Found 25 fwd and 25 rev reads (ratio: 1.0)
100.0% of reads contained both UMIs with max 3 mismatches
[Mon Oct 25 09:11:15 2021]
Finished job 9.
4 of 14 steps (29%) done
Select jobs to execute...

[Mon Oct 25 09:11:15 2021]
rule cluster:
input: example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta
output: example_egfr_single_read_run/clustering/EGFR_917/clusters_centroid.fasta, example_egfr_single_read_run/clustering/EGFR_917/clusters_consensus.fasta, example_egfr_single_read_run/clustering/EGFR_917/vsearch_clusters
jobid: 8
reason: Missing output files: example_egfr_single_read_run/clustering/EGFR_917/clusters_consensus.fasta, example_egfr_single_read_run/clustering/EGFR_917/vsearch_clusters; Input files updated by another job: example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta
wildcards: name=example_egfr_single_read_run, target=EGFR_917
resources: tmpdir=/tmp

mkdir -p example_egfr_single_read_run/clustering/EGFR_917/vsearch_clusters && vsearch --clusterout_id --clusters example_egfr_single_read_run/clustering/EGFR_917/vsearch_clusters/test --centroids example_egfr_single_read_run/clustering/EGFR_917/clusters_centroid.fasta --consout example_egfr_single_read_run/clustering/EGFR_917/clusters_consensus.fasta --minseqlength 40 --maxseqlength 60 --qmask none --threads 1 --cluster_fast example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta --clusterout_sort --gapopen 0E/5I --gapext 0E/2I --mismatch -8 --match 6 --iddef 0 --minwordmatches 0 --qmask none -id 0.85
vsearch v2.18.0_linux_x86_64, 62.7GB RAM, 32 cores
https://github.com/torognes/vsearch

Reading file example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta 100%
2802 nt in 50 seqs, min 54, max 58, avg 56
Sorting by length 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 1 Size min 50, max 50, avg 50.0
Singletons: 0, 0.0% of seqs, 0.0% of clusters
Multiple alignments 100%
[Mon Oct 25 09:11:16 2021]
Finished job 8.
5 of 14 steps (36%) done
Select jobs to execute...

[Mon Oct 25 09:11:16 2021]
rule reformat_filter_clusters:
input: example_egfr_single_read_run/clustering/EGFR_917/clusters_consensus.fasta, example_egfr_single_read_run/clustering/EGFR_917/vsearch_clusters
output: example_egfr_single_read_run/clustering/EGFR_917/clusters_fa, example_egfr_single_read_run/stats/EGFR_917_vsearch_cluster_stats.tsv, example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa
jobid: 7
reason: Missing output files: example_egfr_single_read_run/clustering/EGFR_917/clusters_fa, example_egfr_single_read_run/stats/EGFR_917_vsearch_cluster_stats.tsv, example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa; Input files updated by another job: example_egfr_single_read_run/clustering/EGFR_917/clusters_consensus.fasta, example_egfr_single_read_run/clustering/EGFR_917/vsearch_clusters
wildcards: name=example_egfr_single_read_run, target=EGFR_917
resources: tmpdir=/tmp

umi_parse_clusters --smolecule_out example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa --balance_strands --min_reads_per_clusters 20 --max_reads_per_clusters 60 --stats_out example_egfr_single_read_run/stats/EGFR_917_vsearch_cluster_stats.tsv -o example_egfr_single_read_run/clustering/EGFR_917/clusters_fa example_egfr_single_read_run/clustering/EGFR_917/clusters_consensus.fasta example_egfr_single_read_run/clustering/EGFR_917/vsearch_clusters
100%|████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1599.05it/s]
Clusters: 100% written (1)
Reads: 50 found
Reads: 0 removed (0.0%)
Reads: 100% written
Reads: 100% in written clusters
[Mon Oct 25 09:11:16 2021]
Finished job 7.
6 of 14 steps (43%) done
Select jobs to execute...

[Mon Oct 25 09:11:16 2021]
rule polish_clusters:
input: example_egfr_single_read_run/clustering/EGFR_917/clusters_fa, example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa
output: example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp, example_egfr_single_read_run/fasta/EGFR_917_consensus.bam, example_egfr_single_read_run/fasta/EGFR_917_consensus.fasta
jobid: 6
reason: Missing output files: example_egfr_single_read_run/fasta/EGFR_917_consensus.fasta; Input files updated by another job: example_egfr_single_read_run/clustering/EGFR_917/clusters_fa, example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa
wildcards: name=example_egfr_single_read_run, target=EGFR_917
resources: tmpdir=/tmp

    rm -rf example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp
    medaka smolecule --threads 1 --length 50 --depth 2 --model r941_min_high_g360 --method spoa example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp 2> example_egfr_single_read_run/fasta/EGFR_917_consensus.bam_smolecule.log
    cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/consensus.fasta example_egfr_single_read_run/fasta/EGFR_917_consensus.fasta
    cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/subreads_to_spoa.bam example_egfr_single_read_run/fasta/EGFR_917_consensus.bam && cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/subreads_to_spoa.bam.bai example_egfr_single_read_run/fasta/EGFR_917_consensus.bam.bai

[Mon Oct 25 09:11:19 2021]
Error in rule polish_clusters:
jobid: 6
output: example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp, example_egfr_single_read_run/fasta/EGFR_917_consensus.bam, example_egfr_single_read_run/fasta/EGFR_917_consensus.fasta
shell:

    rm -rf example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp
    medaka smolecule --threads 1 --length 50 --depth 2 --model r941_min_high_g360 --method spoa example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp 2> example_egfr_single_read_run/fasta/EGFR_917_consensus.bam_smolecule.log
    cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/consensus.fasta example_egfr_single_read_run/fasta/EGFR_917_consensus.fasta
    cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/subreads_to_spoa.bam example_egfr_single_read_run/fasta/EGFR_917_consensus.bam && cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/subreads_to_spoa.bam.bai example_egfr_single_read_run/fasta/EGFR_917_consensus.bam.bai
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: pipeline-umi-amplicon/.snakemake/log/2021-10-25T091115.498024.snakemake.log

The text was updated successfully, but these errors were encountered:

samir-watson · 2021-11-02T13:38:09Z

Im having the same problem

samir-watson · 2021-11-02T14:13:09Z

went through other issues posted and found that pip install numpy==1.19.5 fixed the problem for me

scogi · 2021-11-05T19:40:28Z

Hi @samir-watson, thank you for pointing this out. I also found the issue you are probably referring (#11 ). Unfortunately, for me installing numpy 1.19.5 did not solve the problem. I followed the procedure as in #11 but still the same problem

scogi · 2021-11-09T07:26:15Z

Update: i installed the pipeline on my laptop as well, experiencing the same issue. There installing numpy 1.19.5 indeed fixed the problem, but not on our data analysis system.

Numpy Error: nanoporetech#13 Solution (Downgrade Numpy): nanoporetech#13 (comment) With reference to the above; proposal to include (numpy==1.19.5) as otherwise numpy will default to (numpy-base-1.21.2 | 4.8 MB | ) when installed using provided conda instructions which errors.

Numpy Error: #13 Solution (Downgrade Numpy): #13 (comment) With reference to the above; proposal to include (numpy==1.19.5) as otherwise numpy will default to (numpy-base-1.21.2 | 4.8 MB | ) when installed using provided conda instructions which errors.

scogi mentioned this issue Nov 5, 2021

the installation test of pipeline-umi-amplicon failed #11

Open

scogi closed this as completed Nov 9, 2021

CrispyNuggetD mentioned this issue Mar 21, 2022

Specify numpy version prevents error #15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in rule polish_clusters (both with the test data and with real data) #13

Error in rule polish_clusters (both with the test data and with real data) #13

scogi commented Oct 25, 2021 •

edited

Loading

samir-watson commented Nov 2, 2021

samir-watson commented Nov 2, 2021

scogi commented Nov 5, 2021 •

edited

Loading

scogi commented Nov 9, 2021

Error in rule polish_clusters (both with the test data and with real data) #13

Error in rule polish_clusters (both with the test data and with real data) #13

Comments

scogi commented Oct 25, 2021 • edited Loading

samir-watson commented Nov 2, 2021

samir-watson commented Nov 2, 2021

scogi commented Nov 5, 2021 • edited Loading

scogi commented Nov 9, 2021

scogi commented Oct 25, 2021 •

edited

Loading

scogi commented Nov 5, 2021 •

edited

Loading