Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES Segmentation fault #1294

Closed
fjosefdz opened this issue May 7, 2024 · 16 comments
Closed
Labels
question Further information is requested
Milestone

Comments

@fjosefdz
Copy link

fjosefdz commented May 7, 2024

Description of the bug

Hi, I am trying to run the rnaseq pipeline (3.14.0) but I always get stack in the STAR_ALIGN_IGENOMES. THerror is:

.command.sh: line 10: 42 Segmentation fault

I am using singularity in a HPC, sending the job throught slurm. I tried to add more CPUs and more RAM space but is always the same error. Is it a way to fix it? Is it a problem with the STAR version (2.7.10a)?

Command used and terminal output

source /home/conda/miniconda3/bin/activate env_nf

nextflow run nf-core/rnaseq  \
      --input /path/to/samplesheet.csv  \
      --outdir /path/to/output/  \
      --genome GRCh37  \
      --aligner 'star_salmon'  \
      -profile singularity  \
      --max_memory '64.GB'  \
      --max_cpus 16  \
      --max_time '240.h'  \
      -r 3.14.0  \

The error is similar to this one:
#684 (comment)

Relevant files

No response

System information

No response

@fjosefdz fjosefdz added the bug Something isn't working label May 7, 2024
@maxulysse
Copy link
Member

Can you share more information, like the .nextflow.log file?

@fjosefdz
Copy link
Author

fjosefdz commented May 8, 2024

Sure. Thank you.
.nextflow.log

@drpatelh drpatelh added question Further information is requested and removed bug Something isn't working labels May 13, 2024
@drpatelh drpatelh added this to the 3.15.0 milestone May 13, 2024
@pinin4fjords
Copy link
Member

Do you also have the STAR logs from the process directory?

Also, just to be clear, something like --max_memory '64.GB' sets the maximum bounds on memory for the workflow as a whole, and in the case of STAR this would be a reduction of the default memory allocation.

@fjosefdz
Copy link
Author

fjosefdz commented May 22, 2024

Do you mean this?

path/to/test/test_output/nextflow_work/67/8bff9d83c65cb091f7207dd3befa38/.command.sh: line 10: 44 Segmentation fault STAR --genomeDir SAindex --readFilesIn sample_test_1_val_1.fq.gz sample_test_2_val_2.fq.gz --runThreadN 12 --outFileNamePrefix e_pal_1001_11_c_1_r. --sjdbGTFfile genome.filtered.gtf --outSAMattrRGline ID:e_pal_1001_11_c_1_r 'SM:e_pal_1001_11_c_1_r' --quantMode TranscriptomeSAM --twopassMode Basic --outSAMtype BAM Unsorted --readFilesCommand zcat --runRNGseed 0 --outFilterMultimapNmax 20 --alignSJDBoverhangMin 1 --outSAMattributes NH HI AS NM MD --quantTranscriptomeBan Singleend --outSAMstrandField intronMotif

I also tried running using --aligner 'star_rsem' and it works. Is it star_rsem using another version of STAR? How could be run with rsem but not with salmon?

@fjosefdz
Copy link
Author

This is the code I use for running the nfcore rnaseq pipeline throught slurm:

#!/bin/bash
#SBATCH --job-name=nfcore_rnaseq
#SBATCH --partition cpu
#SBATCH --nodes 1
#SBATCH --tasks-per-node 2
#SBATCH --cpus-per-task 24
#SBATCH --time 24:00:00
#SBATCH --mem 128G
#SBATCH --output /home/isilon/onko_datasets/test/%x_%j.out
#SBATCH --error /home/isilon/onko_datasets/test/%x_%j.err

source path/to/conda/miniconda3/bin/activate base
conda activate path/to/.conda/envs/env_nf

export NXF_SINGULARITY_CACHEDIR=path/to/nf-core_rnaseq/nf-core_rnaseq_singularity/"

nextflow run nf-core/rnaseq
--input /path/to/samplesheet.csv
--outdir /path/to/output/
--genome GRCh37
--aligner 'star_salmon'
-profile singularity
--max_memory '64.GB'
--max_cpus 16
--max_time '240.h'
-r 3.14.0 \

@pinin4fjords
Copy link
Member

Do you mean this?

No, if you have a look in the process work directory for STAR there should be more logs that might tell us where it's failing.

@fjosefdz
Copy link
Author

fjosefdz commented May 27, 2024

These are the log and the run files from the directory where the error was generated (path/to/test/test_output/nextflow_work/f3/6ddef0ebba1b326eaa2d7ee3302030/).

.command.log
run_file.txt

@pinin4fjords
Copy link
Member

pinin4fjords commented May 30, 2024

Normally we'd expect a STAR log file ending in .Log.out, and that's what we'd need to have some hope of understanding what STAR was doing when it seg faulted. Could you confirm that such a file is not present? If so it's going to be hard for us to understand what was happening when the seg fault occurred.

But, here are some suggestions for you to follow as you debug.

1. Are you able to run the test_full profile of the workflow?

nextflow run nf-core/rnaseq  \
      --outdir /path/to/output/  \
      --genome GRCh37  \
      --aligner 'star_salmon'  \
      -profile test_full,singularity  \
      --max_memory '64.GB'  \
      --max_cpus 16  \
      --max_time '240.h'  \
      -r 3.14.0  \

If that fails to run, it would point to something specific to your HPC systems, which it would be hard for us to help with. If that does run, then there will be something specific to your input reads.

2. Can you run the STAR process manually?

I would suggest that you copy the task directory and try running the alignment manually, either with the singularity image or a Conda environment. Note that this process uses STAR version 2.6.1d for compatibility with indices in iGenomes. That might provide you with more information about why things are failing to run

3. Try with up-to-date references, not using iGenomes

Using iGenomes (i.e. --genome) is not currently recommended.

I would suggest that you try running by specifying inputs directly as per that documentation link. You will need sufficient resource to generate a STAR index, but you can do that just once by using the save_reference option to output the index files so you can store them elsewhere and supply them next time you run. This will use an up-to-date STAR which may not have the same issue for you.

Use the up-to-date Ensembl reference files if possible, but even if you require GRCh37, supplying the GTF and FASTA inputs directly rather than using --genome will use the newer STAR and may sidestep these issues.

@fjosefdz
Copy link
Author

fjosefdz commented Jun 5, 2024

The log. out of STAR, it's empty. I tried the -profile test_full and I got the same error in the same step, so I suppose it's an error because of my HPC system. I also used the genome and index generated by me but the error persisted. I have now two questions:

  1. The error doesn't appear when I used RSEM instead of Salmon, but using also STAR (--aligner "star_rsem"), do you have any idea why this happens?
  2. Is there any way to feed the pipeline with the bam files generated by the --aligner "star_rsem" to skip the previous steps continue with the rest of the pipeline and use salmon as a mapper?

Thank you for all the help.

@pinin4fjords
Copy link
Member

pinin4fjords commented Jun 5, 2024

Could you try without --genome (see above) please? The iGenomes option triggers a different STAR version, so you may find that not using --genome fixes your issue with no additional work. This applies even if you supply the same input files (you can even take them from https://github.com/nf-core/rnaseq/blob/master/conf/igenomes.config, the igenomes_base is here), though I would recommend you use newer files.

@fjosefdz
Copy link
Author

fjosefdz commented Jun 5, 2024

Yes, I download the genomes from the AWS:

aws s3 --no-sign-request --region eu-west-1 sync s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/

aws s3 --no-sign-request --region eu-west-1 sync s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/ ./references/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/ --exclude "*" --include "genes.bed"

and provided:
--fasta /path/to/references/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/genome.fa
--gtf /path/to/references/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/genes.gtf
--star_index /path/to/references/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/SAindex
--gene_bed /path/to/references/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed \

but the segfault still persisted.

@pinin4fjords
Copy link
Member

Don't provide the STAR index (that won't be compatible). Assuming you have sufficient resource to do the STAR indexing, just provide the FASTA and GTF.

@fjosefdz
Copy link
Author

I tried using the "-profile test_full,singularity", and the segfault is still there. When I don't provide the index still being another sigfault because of STAR, it seems I need to look at what STAR does in the back and it is interfering with clusters access. Do you have any ideas?

And also as I asked before:
The error doesn't appear when I used RSEM instead of Salmon, but using also STAR (--aligner "star_rsem"), do you have any idea why this happens?
Is there any way to feed the pipeline with the bam files generated by the --aligner "star_rsem" to skip the previous steps continue with the rest of the pipeline and use salmon as a mapper?

Thanks.

@pinin4fjords
Copy link
Member

By using --genome you are passing a star_index as stored in iGenomes, and triggering an old version of STAR via the STAR_ALIGN_IGENOMES, which is necessary due to the age of the indices in iGenomes. test_full similarly uses --genome, so will likely have the same issue.

RSEM does not use the same process, so will be using a more updated STAR process behind the scenes.

Please, if you can, try again with your data, supplying --fasta and --gtf, but NOT --genome OR star_index. This will trigger a re-indexing using a newer version of STAR (not using STAR_ALIGN_IGENOMES), which I suspect will do the trick, provided you have sufficient resource.

You can use --save_reference to make sure only need do the indexing once.

@fjosefdz
Copy link
Author

Hi, I tried using the last version of GRCh38 (v46), triggering a re-indexing and now star+salmon is working without segfaults, thank you so much. I don't know if I should open a new thread or maybe you can answer me here, there is a way to do rsem and salmon on the same pipeline? I mean run the pipeline and have the results of rsem and salmon. Or, is it possible to feed the pipeline with bam files?

Thank you.

@pinin4fjords
Copy link
Member

Glad it worked!

No, not currently, you can only follow one 'path' at a time through the workflow. Please check the issue queue to see if others have requested the same things, and feel free to create feature requests if not.

Closing this issue as complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants