Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in rule polish_clusters (both with the test data and with real data) #13

Closed
scogi opened this issue Oct 25, 2021 · 4 comments
Closed

Comments

@scogi
Copy link

scogi commented Oct 25, 2021

Hello,
I installed the pipeline, which seems to have worked fine, but when running the pipeline test as recommended

# To test if the installation was successful run $ snakemake -j 1 -pr --configfile config.yml

I get an error at step 6 (might be as in #5 ). The same error occurs also with real data. Would be very grateful for any help. Thanks!

$ snakemake -j 1 -pr --configfile config.yml
Targets: EGFR_917
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads

cluster 1 1 1
cluster_consensus 1 1 1
copy_bed 1 1 1
detect_umi_consensus_fasta 1 1 1
detect_umi_fasta 1 1 1
map_1d 1 1 1
map_consensus 2 1 1
polish_clusters 1 1 1
reads 1 1 1
reformat_consensus_clusters 1 1 1
reformat_filter_clusters 1 1 1
seqkit_bam_acc_tsv 1 1 1
split_reads 1 1 1
total 14 1 1

Select jobs to execute...

[Mon Oct 25 09:11:15 2021]
rule copy_bed:
input: data/example_egfr_amplicon.bed
output: example_egfr_single_read_run/targets.bed
jobid: 1
reason: Missing output files: example_egfr_single_read_run/targets.bed
wildcards: name=example_egfr_single_read_run
resources: tmpdir=/tmp

cp data/example_egfr_amplicon.bed example_egfr_single_read_run/targets.bed
[Mon Oct 25 09:11:15 2021]
Finished job 1.
1 of 14 steps (7%) done
Select jobs to execute...

[Mon Oct 25 09:11:15 2021]
rule map_1d:
input: data/example_egfr_single_cluster.fastq, data/example_egfr_reference.fasta
output: example_egfr_single_read_run/align/1d.bam, example_egfr_single_read_run/align/1d.bam.bai
jobid: 11
reason: Missing output files: example_egfr_single_read_run/align/1d.bam
wildcards: name=example_egfr_single_read_run
resources: tmpdir=/tmp

catfishq --max_n 0 data/example_egfr_single_cluster.fastq | minimap2 -ax map-ont -k 13 -t 1 data/example_egfr_reference.fasta - | samtools sort -@ 5 -o example_egfr_single_read_run/align/1d.bam - && samtools index -@ 1 example_egfr_single_read_run/align/1d.bam
[M::mm_idx_gen::0.0061.09] collected minimizers
[M::mm_idx_gen::0.013
1.04] sorted minimizers
[M::main::0.0131.04] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.013
1.04] mid_occ = 13
[M::mm_idx_stat] kmer size: 13; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.0141.04] distinct minimizers: 35388 (98.51% are singletons); average occurrences: 1.028; average spacing: 5.326; total length: 193845
[M::worker_pipeline::0.086
0.42] mapped 50 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -ax map-ont -k 13 -t 1 data/example_egfr_reference.fasta -
[M::main] Real time: 0.088 sec; CPU: 0.039 sec; Peak RSS: 0.008 GB
[Mon Oct 25 09:11:15 2021]
Finished job 11.
2 of 14 steps (14%) done
Select jobs to execute...

[Mon Oct 25 09:11:15 2021]
rule split_reads:
input: example_egfr_single_read_run/align/1d.bam
output: example_egfr_single_read_run/fasta_filtered, example_egfr_single_read_run/stats/umi_filter_reads_stats.txt
jobid: 10
reason: Missing output files: example_egfr_single_read_run/fasta_filtered; Input files updated by another job: example_egfr_single_read_run/align/1d.bam
wildcards: name=example_egfr_single_read_run
resources: tmpdir=/tmp

    mkdir -p example_egfr_single_read_run/fasta_filtered
    umi_filter_reads --min_overlap 0.9 -o example_egfr_single_read_run/fasta_filtered data/example_egfr_amplicon.bed example_egfr_single_read_run/align/1d.bam 2>&1 | tee example_egfr_single_read_run/stats/umi_filter_reads_stats.txt

Reads found: 50
Reads unmapped: 0 (0%)
EGFR_917
Reads found: 50
On target: 50 (100%)
0 concatamers - 0%
0 short - 0%
[Mon Oct 25 09:11:15 2021]
Finished job 10.
3 of 14 steps (21%) done
Select jobs to execute...

[Mon Oct 25 09:11:15 2021]
rule detect_umi_fasta:
input: example_egfr_single_read_run/fasta_filtered
output: example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta
jobid: 9
reason: Missing output files: example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta; Input files updated by another job: example_egfr_single_read_run/fasta_filtered
wildcards: name=example_egfr_single_read_run, target=EGFR_917
resources: tmpdir=/tmp

    umi_extract --fwd-context GTATCGTGTAGAGACTGCGTAGG --rev-context AGTGATCGAGTCAGTGCGAGTG --fwd-umi TTTVVVVTTVVVVTTVVVVTTVVVVTTT --rev-umi AAABBBBAABBBBAABBBBAABBBBAAA --max-error 3 example_egfr_single_read_run/fasta_filtered/EGFR_917.fastq -o example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta --tsv example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta.tsv

Counting reads in example_egfr_single_read_run/fasta_filtered/EGFR_917.fastq
100%|█████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 17395.09it/s]
Found 25 fwd and 25 rev reads (ratio: 1.0)
100.0% of reads contained both UMIs with max 3 mismatches
[Mon Oct 25 09:11:15 2021]
Finished job 9.
4 of 14 steps (29%) done
Select jobs to execute...

[Mon Oct 25 09:11:15 2021]
rule cluster:
input: example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta
output: example_egfr_single_read_run/clustering/EGFR_917/clusters_centroid.fasta, example_egfr_single_read_run/clustering/EGFR_917/clusters_consensus.fasta, example_egfr_single_read_run/clustering/EGFR_917/vsearch_clusters
jobid: 8
reason: Missing output files: example_egfr_single_read_run/clustering/EGFR_917/clusters_consensus.fasta, example_egfr_single_read_run/clustering/EGFR_917/vsearch_clusters; Input files updated by another job: example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta
wildcards: name=example_egfr_single_read_run, target=EGFR_917
resources: tmpdir=/tmp

mkdir -p example_egfr_single_read_run/clustering/EGFR_917/vsearch_clusters && vsearch --clusterout_id --clusters example_egfr_single_read_run/clustering/EGFR_917/vsearch_clusters/test --centroids example_egfr_single_read_run/clustering/EGFR_917/clusters_centroid.fasta --consout example_egfr_single_read_run/clustering/EGFR_917/clusters_consensus.fasta --minseqlength 40 --maxseqlength 60 --qmask none --threads 1 --cluster_fast example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta --clusterout_sort --gapopen 0E/5I --gapext 0E/2I --mismatch -8 --match 6 --iddef 0 --minwordmatches 0 --qmask none -id 0.85
vsearch v2.18.0_linux_x86_64, 62.7GB RAM, 32 cores
https://github.com/torognes/vsearch

Reading file example_egfr_single_read_run/fasta_umi/EGFR_917_detected_umis.fasta 100%
2802 nt in 50 seqs, min 54, max 58, avg 56
Sorting by length 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 1 Size min 50, max 50, avg 50.0
Singletons: 0, 0.0% of seqs, 0.0% of clusters
Multiple alignments 100%
[Mon Oct 25 09:11:16 2021]
Finished job 8.
5 of 14 steps (36%) done
Select jobs to execute...

[Mon Oct 25 09:11:16 2021]
rule reformat_filter_clusters:
input: example_egfr_single_read_run/clustering/EGFR_917/clusters_consensus.fasta, example_egfr_single_read_run/clustering/EGFR_917/vsearch_clusters
output: example_egfr_single_read_run/clustering/EGFR_917/clusters_fa, example_egfr_single_read_run/stats/EGFR_917_vsearch_cluster_stats.tsv, example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa
jobid: 7
reason: Missing output files: example_egfr_single_read_run/clustering/EGFR_917/clusters_fa, example_egfr_single_read_run/stats/EGFR_917_vsearch_cluster_stats.tsv, example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa; Input files updated by another job: example_egfr_single_read_run/clustering/EGFR_917/clusters_consensus.fasta, example_egfr_single_read_run/clustering/EGFR_917/vsearch_clusters
wildcards: name=example_egfr_single_read_run, target=EGFR_917
resources: tmpdir=/tmp

umi_parse_clusters --smolecule_out example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa --balance_strands --min_reads_per_clusters 20 --max_reads_per_clusters 60 --stats_out example_egfr_single_read_run/stats/EGFR_917_vsearch_cluster_stats.tsv -o example_egfr_single_read_run/clustering/EGFR_917/clusters_fa example_egfr_single_read_run/clustering/EGFR_917/clusters_consensus.fasta example_egfr_single_read_run/clustering/EGFR_917/vsearch_clusters
100%|████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1599.05it/s]
Clusters: 100% written (1)
Reads: 50 found
Reads: 0 removed (0.0%)
Reads: 100% written
Reads: 100% in written clusters
[Mon Oct 25 09:11:16 2021]
Finished job 7.
6 of 14 steps (43%) done
Select jobs to execute...

[Mon Oct 25 09:11:16 2021]
rule polish_clusters:
input: example_egfr_single_read_run/clustering/EGFR_917/clusters_fa, example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa
output: example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp, example_egfr_single_read_run/fasta/EGFR_917_consensus.bam, example_egfr_single_read_run/fasta/EGFR_917_consensus.fasta
jobid: 6
reason: Missing output files: example_egfr_single_read_run/fasta/EGFR_917_consensus.fasta; Input files updated by another job: example_egfr_single_read_run/clustering/EGFR_917/clusters_fa, example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa
wildcards: name=example_egfr_single_read_run, target=EGFR_917
resources: tmpdir=/tmp

    rm -rf example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp
    medaka smolecule --threads 1 --length 50 --depth 2 --model r941_min_high_g360 --method spoa example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp 2> example_egfr_single_read_run/fasta/EGFR_917_consensus.bam_smolecule.log
    cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/consensus.fasta example_egfr_single_read_run/fasta/EGFR_917_consensus.fasta
    cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/subreads_to_spoa.bam example_egfr_single_read_run/fasta/EGFR_917_consensus.bam && cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/subreads_to_spoa.bam.bai example_egfr_single_read_run/fasta/EGFR_917_consensus.bam.bai

[Mon Oct 25 09:11:19 2021]
Error in rule polish_clusters:
jobid: 6
output: example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp, example_egfr_single_read_run/fasta/EGFR_917_consensus.bam, example_egfr_single_read_run/fasta/EGFR_917_consensus.fasta
shell:

    rm -rf example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp
    medaka smolecule --threads 1 --length 50 --depth 2 --model r941_min_high_g360 --method spoa example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp 2> example_egfr_single_read_run/fasta/EGFR_917_consensus.bam_smolecule.log
    cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/consensus.fasta example_egfr_single_read_run/fasta/EGFR_917_consensus.fasta
    cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/subreads_to_spoa.bam example_egfr_single_read_run/fasta/EGFR_917_consensus.bam && cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/subreads_to_spoa.bam.bai example_egfr_single_read_run/fasta/EGFR_917_consensus.bam.bai
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

Complete log: pipeline-umi-amplicon/.snakemake/log/2021-10-25T091115.498024.snakemake.log

@samir-watson
Copy link

Im having the same problem

@samir-watson
Copy link

went through other issues posted and found that pip install numpy==1.19.5 fixed the problem for me

@scogi
Copy link
Author

scogi commented Nov 5, 2021

Hi @samir-watson, thank you for pointing this out. I also found the issue you are probably referring (#11 ). Unfortunately, for me installing numpy 1.19.5 did not solve the problem. I followed the procedure as in #11 but still the same problem

@scogi
Copy link
Author

scogi commented Nov 9, 2021

Update: i installed the pipeline on my laptop as well, experiencing the same issue. There installing numpy 1.19.5 indeed fixed the problem, but not on our data analysis system.

@scogi scogi closed this as completed Nov 9, 2021
CrispyNuggetD added a commit to CrispyNuggetD/pipeline-umi-amplicon that referenced this issue Mar 21, 2022
Numpy Error:
nanoporetech#13

Solution (Downgrade Numpy):
nanoporetech#13 (comment)

With reference to the above; proposal to include (numpy==1.19.5) as otherwise numpy will default to (numpy-base-1.21.2    | 4.8 MB    | ) when installed using provided conda instructions which errors.
cjw85 pushed a commit that referenced this issue Mar 21, 2022
Numpy Error:
#13

Solution (Downgrade Numpy):
#13 (comment)

With reference to the above; proposal to include (numpy==1.19.5) as otherwise numpy will default to (numpy-base-1.21.2    | 4.8 MB    | ) when installed using provided conda instructions which errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants