NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES Segmentation fault #1294

fjosefdz · 2024-05-07T14:30:54Z

Description of the bug

Hi, I am trying to run the rnaseq pipeline (3.14.0) but I always get stack in the STAR_ALIGN_IGENOMES. THerror is:

.command.sh: line 10: 42 Segmentation fault

I am using singularity in a HPC, sending the job throught slurm. I tried to add more CPUs and more RAM space but is always the same error. Is it a way to fix it? Is it a problem with the STAR version (2.7.10a)?

Command used and terminal output

source /home/conda/miniconda3/bin/activate env_nf

nextflow run nf-core/rnaseq  \
      --input /path/to/samplesheet.csv  \
      --outdir /path/to/output/  \
      --genome GRCh37  \
      --aligner 'star_salmon'  \
      -profile singularity  \
      --max_memory '64.GB'  \
      --max_cpus 16  \
      --max_time '240.h'  \
      -r 3.14.0  \

The error is similar to this one:
#684 (comment)

Relevant files

No response

System information

No response

maxulysse · 2024-05-07T14:44:50Z

Can you share more information, like the .nextflow.log file?

fjosefdz · 2024-05-08T06:44:38Z

Sure. Thank you.
.nextflow.log

pinin4fjords · 2024-05-14T10:55:44Z

Do you also have the STAR logs from the process directory?

Also, just to be clear, something like --max_memory '64.GB' sets the maximum bounds on memory for the workflow as a whole, and in the case of STAR this would be a reduction of the default memory allocation.

fjosefdz · 2024-05-22T05:44:31Z

Do you mean this?

path/to/test/test_output/nextflow_work/67/8bff9d83c65cb091f7207dd3befa38/.command.sh: line 10: 44 Segmentation fault STAR --genomeDir SAindex --readFilesIn sample_test_1_val_1.fq.gz sample_test_2_val_2.fq.gz --runThreadN 12 --outFileNamePrefix e_pal_1001_11_c_1_r. --sjdbGTFfile genome.filtered.gtf --outSAMattrRGline ID:e_pal_1001_11_c_1_r 'SM:e_pal_1001_11_c_1_r' --quantMode TranscriptomeSAM --twopassMode Basic --outSAMtype BAM Unsorted --readFilesCommand zcat --runRNGseed 0 --outFilterMultimapNmax 20 --alignSJDBoverhangMin 1 --outSAMattributes NH HI AS NM MD --quantTranscriptomeBan Singleend --outSAMstrandField intronMotif

I also tried running using --aligner 'star_rsem' and it works. Is it star_rsem using another version of STAR? How could be run with rsem but not with salmon?

fjosefdz · 2024-05-23T06:41:19Z

This is the code I use for running the nfcore rnaseq pipeline throught slurm:

#!/bin/bash
#SBATCH --job-name=nfcore_rnaseq
#SBATCH --partition cpu
#SBATCH --nodes 1
#SBATCH --tasks-per-node 2
#SBATCH --cpus-per-task 24
#SBATCH --time 24:00:00
#SBATCH --mem 128G
#SBATCH --output /home/isilon/onko_datasets/test/%x_%j.out
#SBATCH --error /home/isilon/onko_datasets/test/%x_%j.err

source path/to/conda/miniconda3/bin/activate base
conda activate path/to/.conda/envs/env_nf

export NXF_SINGULARITY_CACHEDIR=path/to/nf-core_rnaseq/nf-core_rnaseq_singularity/"

nextflow run nf-core/rnaseq
--input /path/to/samplesheet.csv
--outdir /path/to/output/
--genome GRCh37
--aligner 'star_salmon'
-profile singularity
--max_memory '64.GB'
--max_cpus 16
--max_time '240.h'
-r 3.14.0 \

pinin4fjords · 2024-05-23T12:28:02Z

Do you mean this?

No, if you have a look in the process work directory for STAR there should be more logs that might tell us where it's failing.

fjosefdz · 2024-05-27T14:53:13Z

These are the log and the run files from the directory where the error was generated (path/to/test/test_output/nextflow_work/f3/6ddef0ebba1b326eaa2d7ee3302030/).

.command.log
run_file.txt

pinin4fjords · 2024-05-30T14:32:53Z

Normally we'd expect a STAR log file ending in .Log.out, and that's what we'd need to have some hope of understanding what STAR was doing when it seg faulted. Could you confirm that such a file is not present? If so it's going to be hard for us to understand what was happening when the seg fault occurred.

But, here are some suggestions for you to follow as you debug.

1. Are you able to run the test_full profile of the workflow?

nextflow run nf-core/rnaseq  \
      --outdir /path/to/output/  \
      --genome GRCh37  \
      --aligner 'star_salmon'  \
      -profile test_full,singularity  \
      --max_memory '64.GB'  \
      --max_cpus 16  \
      --max_time '240.h'  \
      -r 3.14.0  \

If that fails to run, it would point to something specific to your HPC systems, which it would be hard for us to help with. If that does run, then there will be something specific to your input reads.

2. Can you run the STAR process manually?

I would suggest that you copy the task directory and try running the alignment manually, either with the singularity image or a Conda environment. Note that this process uses STAR version 2.6.1d for compatibility with indices in iGenomes. That might provide you with more information about why things are failing to run

3. Try with up-to-date references, not using iGenomes

Using iGenomes (i.e. --genome) is not currently recommended.

I would suggest that you try running by specifying inputs directly as per that documentation link. You will need sufficient resource to generate a STAR index, but you can do that just once by using the save_reference option to output the index files so you can store them elsewhere and supply them next time you run. This will use an up-to-date STAR which may not have the same issue for you.

Use the up-to-date Ensembl reference files if possible, but even if you require GRCh37, supplying the GTF and FASTA inputs directly rather than using --genome will use the newer STAR and may sidestep these issues.

fjosefdz · 2024-06-05T12:03:24Z

The log. out of STAR, it's empty. I tried the -profile test_full and I got the same error in the same step, so I suppose it's an error because of my HPC system. I also used the genome and index generated by me but the error persisted. I have now two questions:

The error doesn't appear when I used RSEM instead of Salmon, but using also STAR (--aligner "star_rsem"), do you have any idea why this happens?
Is there any way to feed the pipeline with the bam files generated by the --aligner "star_rsem" to skip the previous steps continue with the rest of the pipeline and use salmon as a mapper?

Thank you for all the help.

pinin4fjords · 2024-06-05T12:13:47Z

Could you try without --genome (see above) please? The iGenomes option triggers a different STAR version, so you may find that not using --genome fixes your issue with no additional work. This applies even if you supply the same input files (you can even take them from https://github.com/nf-core/rnaseq/blob/master/conf/igenomes.config, the igenomes_base is here), though I would recommend you use newer files.

fjosefdz · 2024-06-05T13:21:22Z

Yes, I download the genomes from the AWS:

aws s3 --no-sign-request --region eu-west-1 sync s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/

aws s3 --no-sign-request --region eu-west-1 sync s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/ ./references/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/ --exclude "*" --include "genes.bed"

and provided:
--fasta /path/to/references/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/genome.fa
--gtf /path/to/references/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/genes.gtf
--star_index /path/to/references/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/SAindex
--gene_bed /path/to/references/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed \

but the segfault still persisted.

pinin4fjords · 2024-06-05T14:16:24Z

Don't provide the STAR index (that won't be compatible). Assuming you have sufficient resource to do the STAR indexing, just provide the FASTA and GTF.

fjosefdz · 2024-06-13T13:48:54Z

I tried using the "-profile test_full,singularity", and the segfault is still there. When I don't provide the index still being another sigfault because of STAR, it seems I need to look at what STAR does in the back and it is interfering with clusters access. Do you have any ideas?

And also as I asked before:
The error doesn't appear when I used RSEM instead of Salmon, but using also STAR (--aligner "star_rsem"), do you have any idea why this happens?
Is there any way to feed the pipeline with the bam files generated by the --aligner "star_rsem" to skip the previous steps continue with the rest of the pipeline and use salmon as a mapper?

Thanks.

pinin4fjords · 2024-06-13T14:40:19Z

By using --genome you are passing a star_index as stored in iGenomes, and triggering an old version of STAR via the STAR_ALIGN_IGENOMES, which is necessary due to the age of the indices in iGenomes. test_full similarly uses --genome, so will likely have the same issue.

RSEM does not use the same process, so will be using a more updated STAR process behind the scenes.

Please, if you can, try again with your data, supplying --fasta and --gtf, but NOT --genome OR star_index. This will trigger a re-indexing using a newer version of STAR (not using STAR_ALIGN_IGENOMES), which I suspect will do the trick, provided you have sufficient resource.

You can use --save_reference to make sure only need do the indexing once.

fjosefdz · 2024-06-18T09:40:17Z

Hi, I tried using the last version of GRCh38 (v46), triggering a re-indexing and now star+salmon is working without segfaults, thank you so much. I don't know if I should open a new thread or maybe you can answer me here, there is a way to do rsem and salmon on the same pipeline? I mean run the pipeline and have the results of rsem and salmon. Or, is it possible to feed the pipeline with bam files?

Thank you.

pinin4fjords · 2024-06-18T10:46:21Z

Glad it worked!

No, not currently, you can only follow one 'path' at a time through the workflow. Please check the issue queue to see if others have requested the same things, and feel free to create feature requests if not.

Closing this issue as complete.

fjosefdz added the bug Something isn't working label May 7, 2024

drpatelh added question Further information is requested and removed bug Something isn't working labels May 13, 2024

drpatelh added this to the 3.15.0 milestone May 13, 2024

drpatelh added the awaiting-response-developers label May 29, 2024

pinin4fjords added awaiting-response-community and removed awaiting-response-developers labels May 30, 2024

pinin4fjords closed this as completed Jun 18, 2024

pinin4fjords removed the awaiting-response-community label Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES Segmentation fault #1294

NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES Segmentation fault #1294

fjosefdz commented May 7, 2024 •

edited

Loading

maxulysse commented May 7, 2024

fjosefdz commented May 8, 2024

pinin4fjords commented May 14, 2024

fjosefdz commented May 22, 2024 •

edited

Loading

fjosefdz commented May 23, 2024

pinin4fjords commented May 23, 2024

fjosefdz commented May 27, 2024 •

edited

Loading

pinin4fjords commented May 30, 2024 •

edited

Loading

fjosefdz commented Jun 5, 2024

pinin4fjords commented Jun 5, 2024 •

edited

Loading

fjosefdz commented Jun 5, 2024 •

edited

Loading

pinin4fjords commented Jun 5, 2024

fjosefdz commented Jun 13, 2024

pinin4fjords commented Jun 13, 2024

fjosefdz commented Jun 18, 2024

pinin4fjords commented Jun 18, 2024

NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES Segmentation fault #1294

NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES Segmentation fault #1294

Comments

fjosefdz commented May 7, 2024 • edited Loading

Description of the bug

Command used and terminal output

Relevant files

System information

maxulysse commented May 7, 2024

fjosefdz commented May 8, 2024

pinin4fjords commented May 14, 2024

fjosefdz commented May 22, 2024 • edited Loading

fjosefdz commented May 23, 2024

pinin4fjords commented May 23, 2024

fjosefdz commented May 27, 2024 • edited Loading

pinin4fjords commented May 30, 2024 • edited Loading

fjosefdz commented Jun 5, 2024

pinin4fjords commented Jun 5, 2024 • edited Loading

fjosefdz commented Jun 5, 2024 • edited Loading

pinin4fjords commented Jun 5, 2024

fjosefdz commented Jun 13, 2024

pinin4fjords commented Jun 13, 2024

fjosefdz commented Jun 18, 2024

pinin4fjords commented Jun 18, 2024

fjosefdz commented May 7, 2024 •

edited

Loading

fjosefdz commented May 22, 2024 •

edited

Loading

fjosefdz commented May 27, 2024 •

edited

Loading

pinin4fjords commented May 30, 2024 •

edited

Loading

pinin4fjords commented Jun 5, 2024 •

edited

Loading

fjosefdz commented Jun 5, 2024 •

edited

Loading