Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyphenated sample names causes downstream error #1364

Open
mniederhuber opened this issue Aug 28, 2024 · 1 comment
Open

Hyphenated sample names causes downstream error #1364

mniederhuber opened this issue Aug 28, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@mniederhuber
Copy link

Description of the bug

Ran into an error with the summarized experiment process

Process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SE_TRANSCRIPT (all_samples)`

Error message is from R:

Error in findColumnWithAllEntries(ids, metadata) : 
No column contains all vector entries

Tracked it down to the parse_metadata function in the R script.

metadata_id_col <- findColumnWithAllEntries(ids, metadata)

I had used hyphens in my sample names, but the ids passed to findColumnWithAllEntries have all the hyphens replaced with '.'
eg. "D10-D_Na-R1" becomes "D10.D_Na.R1"

Looks like this is happening with the output from salmon, the column names from the salmon.merged.transcript_counts.tsv, which are used to set the ids variable in the Rscript, have the incorrect sample names.

Easy fix to just correct the names in the sample sheet.

But it might be useful to add to another check when initially parsing the sample sheet to catch this right out of the gate.

Command used and terminal output

#!/bin/bash
#SBATCH --job-name=fashe
#SBATCH -p barc
#SBATCH -t 12:00:00
#SBATCH --mem=8G
#SBATCH -o log/rna-%j.out
#SBATCH -e log/rna-%j.err

if [ ! -d log ]; then
    mkdir log
fi

module load nextflow

# using the dev branch because of gzip bug that's been fixed
nextflow run nf-core/rnaseq \
    -profile unc_longleaf \
    -params-file conf/rnaseq_params.yaml \
    -r dev

Relevant files

No response

System information

Nextflow 24.04.2
HPC
Slurm
Singularity
Rhel8
nf-core/rnaseq dev branch

@mniederhuber mniederhuber added the bug Something isn't working label Aug 28, 2024
@idot
Copy link

idot commented Sep 10, 2024

The same error comes up when the sample names are numeric Ids. Then R prepends X to the names in the salmon.merged.gene_counts.tsv and this function can not find the samples column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants