Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read2tree issue on mixed dataset #82

Open
masudermann opened this issue Jul 15, 2024 · 0 comments
Open

read2tree issue on mixed dataset #82

masudermann opened this issue Jul 15, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@masudermann
Copy link
Contributor

Description of the bug

This is a read2tree specific issue, but I wanted to share it in case we see similar issues with other datasets.

Command used and terminal output

# Main command
CHANGELOG.md  CODE_OF_CONDUCT.md  docs         LICENSE                modules.json  null                  pyproject.toml     seqtk_sample  test                    workflows
(nf-core) marthasudermann@pop-os:~/pathogensurveillance$ nextflow run main.nf -profile mixed,docker -resume
Nextflow 24.04.3 is available - Please consider updating your version to it
N E X T F L O W  ~  version 23.10.1
Launching `main.nf` [sick_leibniz] DSL2 - revision: cc83aa0c27


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/plantpathsurveil v1.0dev
------------------------------------------------------
Core Nextflow options
  runName                   : sick_leibniz
  containerEngine           : docker
  launchDir                 : /home/marthasudermann/pathogensurveillance
  workDir                   : /home/marthasudermann/pathogensurveillance/work
  projectDir                : /home/marthasudermann/pathogensurveillance
  userName                  : marthasudermann
  profile                   : mixed,docker
  configFiles               : /home/marthasudermann/pathogensurveillance/nextflow.config

Input/output options
  sample_data               : test/data/metadata/mixed.csv
  out_dir                   : test/output/mixed
  download_bakta_db         : true

Institutional config options
  config_profile_name       : Test profile of mixed (fungi, oomycete, bacteria, nematode) SRA files
  config_profile_description: Test profile of mixed (fungi, oomycete, bacteria, nematode) SRA files

Generic options
  trace_dir                 : null/pipeline_info

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/plantpathsurveil for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/plantpathsurveil/blob/master/CITATIONS.md
------------------------------------------------------
#Errors I am getting

ERROR ~ Error executing process > 'PATHOGENSURVEILLANCE:BUSCO_PHYLOGENY:READ2TREE (oomycete)'

Caused by:
  Process `PATHOGENSURVEILLANCE:BUSCO_PHYLOGENY:READ2TREE (oomycete)` terminated with an error exit status (2)

Command executed:

  # This creates the reference folder
  read2tree --standalone_path oomycete_busco_markers/ --dna_reference oomycete_dna_ref.fa --output_path oomycete_read2tree --reference
  
  # Add each paired end shortread sample
  for R1 in paired_1_01.fa; do
     	R2=$(echo $R1 | sed 's/^paired_1_/paired_2_/')
     	read2tree \
          \
         --threads 8 \
     	--standalone_path oomycete_busco_markers/ \
         --dna_reference oomycete_dna_ref.fa \
     	--output_path oomycete_read2tree \
     	--reads $R1 $R2
  done
  
  # Add each single end shortread sample
  for R1 in ; do
     	read2tree \
          \
         --threads 8 \
     	--standalone_path oomycete_busco_markers/ \
         --dna_reference oomycete_dna_ref.fa \
     	--output_path oomycete_read2tree \
     	--reads $R1
  done
  
  # Add each long read sample
  for R1 in ; do
     	read2tree \
          \
         --threads 8 \
     	--standalone_path oomycete_busco_markers/ \
         --dna_reference oomycete_dna_ref.fa \
     	--output_path oomycete_read2tree \
         --read_type long
     	--reads $R1
  done
  
  # Build tree
  read2tree \
      \
     --threads 8 \
  --standalone_path oomycete_busco_markers/ \
     --dna_reference oomycete_dna_ref.fa \
  --output_path oomycete_read2tree -\
  -merge_all_mappings \
  --tree
  
  cat <<-END_VERSIONS > versions.yml
  "PATHOGENSURVEILLANCE:BUSCO_PHYLOGENY:READ2TREE":
     	read2tree: $(echo $(read2tree --version))
  END_VERSIONS

Command exit status:
  2

Command output:
  --- Load OGs with min 0 species from oma oomycete_busco_markers - mode = marker_genes ---
  2024-07-15 21:29:26,401 - read2tree.OGSet - INFO - --- Load ogs and find their corresponding DNA seq from oomycete_dna_ref.fa ---
  2024-07-15 21:29:26,401 - read2tree.OGSet - INFO - Loading oomycete_dna_ref.fa into memory. This might take a while . . . 
  2024-07-15 21:29:26,435 - read2tree.OGSet - INFO - : Gathering of DNA seq for 249 OGs took 0.02776622772216797.
  --- Generating reference for mapping ---
  2024-07-15 21:29:26,436 - read2tree.ReferenceSet - INFO - : Extracted 4 reference species form 249 ogs took 0.0007266998291015625
  --- Alignment of 249 OGs ---
  2024-07-15 21:29:56,377 - read2tree.Aligner - INFO - : Alignment of 249 OGs took 29.93824291229248.
  --- Re-load ogs and find their corresponding DNA seq from output folder ---
  --- Generating reference for mapping from folder ---
  --- Mapping of reads to reference sequences ---
  2024-07-15 21:29:57,076 - read2tree.Mapper - INFO - paired_1_01: --- Mapping of reads to PERBE reference species ---
  2024-07-15 21:32:43,397 - read2tree.Mapper - INFO - paired_1_01: Mapped 221245 / 45873312 reads to PERBE_OGs.fa
  2024-07-15 21:32:43,453 - read2tree.Mapper - INFO - paired_1_01: Mapping to PERBE_OGs.fa references took 166.3768367767334.
  2024-07-15 21:32:45,102 - read2tree.Mapper - INFO - paired_1_01: --- Mapping of reads to PERMA reference species ---
  2024-07-15 21:35:31,387 - read2tree.Mapper - INFO - paired_1_01: Mapped 116880 / 45873312 reads to PERMA_OGs.fa
  2024-07-15 21:35:31,440 - read2tree.Mapper - INFO - paired_1_01: Mapping to PERMA_OGs.fa references took 166.33702945709229.
  2024-07-15 21:35:32,207 - read2tree.Mapper - INFO - paired_1_01: --- Mapping of reads to PERDE reference species ---
  2024-07-15 21:38:18,619 - read2tree.Mapper - INFO - paired_1_01: Mapped 128630 / 45873312 reads to PERDE_OGs.fa
  2024-07-15 21:38:18,688 - read2tree.Mapper - INFO - paired_1_01: Mapping to PERDE_OGs.fa references took 166.48007535934448.
  2024-07-15 21:38:19,859 - read2tree.Mapper - INFO - paired_1_01: --- Mapping of reads to PHYNI reference species ---
  2024-07-15 21:41:09,594 - read2tree.Mapper - INFO - paired_1_01: Mapped 124871 / 45873312 reads to PHYNI_OGs.fa
  2024-07-15 21:41:09,652 - read2tree.Mapper - INFO - paired_1_01: Mapping to PHYNI_OGs.fa references took 169.79250645637512.
  2024-07-15 21:41:10,498 - read2tree.Mapper - INFO - paired_1_01: Mapping to all references took 673.4247016906738.
  --- Add inferred mapped sequence back to OGs ---
  2024-07-15 21:41:10,668 - read2tree.OGSet - INFO - paired_1_01: Appending 222 reconstructed sequences to present OG took 0.0045795440673828125.
  --- Add inferred mapped sequence back to alignment ---
  2024-07-15 21:41:11,089 - read2tree.Aligner - INFO - paired_1_01: Appending 213 reconstructed sequences to present Alignments took 0.39875292778015137.

Command error:
  
  Mapping reads to species:  50%|█████     | 2/4 [05:35<05:34, 167.48s/ species]2024-07-15 21:35:32,207 - read2tree.Mapper - INFO - paired_1_01: --- Mapping of reads to PERDE reference species ---
  [E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_q_oc3inv/PERDE_OGs.fa.bam'
  2024-07-15 21:38:18,619 - read2tree.Mapper - INFO - paired_1_01: Mapped 128630 / 45873312 reads to PERDE_OGs.fa
  2024-07-15 21:38:18,688 - read2tree.Mapper - INFO - paired_1_01: Mapping to PERDE_OGs.fa references took 166.48007535934448.
  
  Mapping reads to species:  75%|███████▌  | 3/4 [08:22<02:47, 167.56s/ species]2024-07-15 21:38:19,859 - read2tree.Mapper - INFO - paired_1_01: --- Mapping of reads to PHYNI reference species ---
  [E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_q_oc3inv/PHYNI_OGs.fa.bam'
  2024-07-15 21:41:09,594 - read2tree.Mapper - INFO - paired_1_01: Mapped 124871 / 45873312 reads to PHYNI_OGs.fa
  2024-07-15 21:41:09,652 - read2tree.Mapper - INFO - paired_1_01: Mapping to PHYNI_OGs.fa references took 169.79250645637512.
  
  Mapping reads to species: 100%|██████████| 4/4 [11:13<00:00, 168.77s/ species]
  Mapping reads to species: 100%|██████████| 4/4 [11:13<00:00, 168.35s/ species]
  2024-07-15 21:41:10,498 - read2tree.Mapper - INFO - paired_1_01: Mapping to all references took 673.4247016906738.
  
  Adding mapped seq to alignments:   0%|          | 0/249 [00:00<?, ? alignments/s]
  Adding mapped seq to alignments: 100%|██████████| 249/249 [00:00<00:00, 1963123.49 alignments/s]
  
  Adding mapped seq to OG:   0%|          | 0/249 [00:00<?, ? OGs/s]
  Adding mapped seq to OG: 100%|██████████| 249/249 [00:00<00:00, 2300400.21 OGs/s]
  --- Add inferred mapped sequence back to OGs ---
  
  Adding mapped seq to OG:   0%|          | 0/249 [00:00<?, ? OGs/s]
  Adding mapped seq to OG: 100%|██████████| 249/249 [00:00<00:00, 77110.28 OGs/s]
  2024-07-15 21:41:10,668 - read2tree.OGSet - INFO - paired_1_01: Appending 222 reconstructed sequences to present OG took 0.0045795440673828125.
  --- Add inferred mapped sequence back to alignment ---
  
  Adding mapped seq to alignments:   0%|          | 0/249 [00:00<?, ? alignments/s]
  Adding mapped seq to alignments:  33%|███▎      | 81/249 [00:00<00:00, 807.58 alignments/s]
  Adding mapped seq to alignments:  65%|██████▌   | 162/249 [00:00<00:00, 698.92 alignments/s]
  Adding mapped seq to alignments:  94%|█████████▎| 233/249 [00:00<00:00, 581.05 alignments/s]
  Adding mapped seq to alignments: 100%|██████████| 249/249 [00:00<00:00, 624.97 alignments/s]
  2024-07-15 21:41:11,089 - read2tree.Aligner - INFO - paired_1_01: Appending 213 reconstructed sequences to present Alignments took 0.39875292778015137.
  usage: read2tree [-h] [--version] [--output_path OUTPUT_PATH]
                   --standalone_path STANDALONE_PATH [--reads READS [READS ...]]
                   [--read_type READ_TYPE] [--threads THREADS] [--split_reads]
                   [--split_len SPLIT_LEN] [--split_overlap SPLIT_OVERLAP]
                   [--split_min_read_len SPLIT_MIN_READ_LEN] [--sample_reads]
                   [--genome_len GENOME_LEN] [--coverage COVERAGE]
                   [--min_cons_coverage MIN_CONS_COVERAGE]
                   [--dna_reference DNA_REFERENCE] [--sc_threshold SC_THRESHOLD]
                   [--ngmlr_parameters NGMLR_PARAMETERS] [--check_mate_pairing]
                   [--debug] [--sequence_selection_mode SEQUENCE_SELECTION_MODE]
                   [-s SPECIES_NAME] [--tree] [--merge_all_mappings] [-r]
                   [--min_species MIN_SPECIES] [--single_mapping SINGLE_MAPPING]
                   [--ref_folder REF_FOLDER]
                   [--remove_species_mapping REMOVE_SPECIES_MAPPING]
                   [--remove_species_ogs REMOVE_SPECIES_OGS] [--keep_all_ogs]
                   [--ignore_species IGNORE_SPECIES]
  read2tree: error: The number of completed mappings (1) is too little to perform a merge.

Work dir:
  /home/marthasudermann/pathogensurveillance/work/78/53a5214c3b5dd116ec616895015311

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

No response

@masudermann masudermann added the bug Something isn't working label Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant