You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, nf-core/eager does only allow to extract reads that didn't align to the provided reference genome into a single FastQ file. While this is no issue when having merged overlapping read pairs during adapter removal, non-merged sequencing data might still consist of read pairs and single reads at the same time. This simple extraction with samtools view -f 4 /path/to/bam | samtools fastq will then lead to a single file from which it is almost impossible to retrace which are the read pairs and which are the singletons.
Instead nf-core/eager should automatically switch to a different extraction command when confronted with non-merged read pairs:
The first two step ensure that paired reads have complementary flags, which is particularly important with respect to the "unmapped" and "mate unmapped" flag. The samtools view command then uses a filtering expression to extract unmapped reads. It distinguishes between read pairs and singletons. The former are extracted if at least one of the mates of the pair couldn't be aligned. The latter are simply extracted when they are unmapped.
Using the more complex approach will ensure that the FastQ file with the forward reads and the FastQ file with the reverse reads of a paired-end sequencing run will have the same number of reads.
The text was updated successfully, but these errors were encountered:
Currently, nf-core/eager does only allow to extract reads that didn't align to the provided reference genome into a single FastQ file. While this is no issue when having merged overlapping read pairs during adapter removal, non-merged sequencing data might still consist of read pairs and single reads at the same time. This simple extraction with
samtools view -f 4 /path/to/bam | samtools fastq
will then lead to a single file from which it is almost impossible to retrace which are the read pairs and which are the singletons.Instead nf-core/eager should automatically switch to a different extraction command when confronted with non-merged read pairs:
The more complex (but more accurate) solution would be the following:
The first two step ensure that paired reads have complementary flags, which is particularly important with respect to the "unmapped" and "mate unmapped" flag. The
samtools view
command then uses a filtering expression to extract unmapped reads. It distinguishes between read pairs and singletons. The former are extracted if at least one of the mates of the pair couldn't be aligned. The latter are simply extracted when they are unmapped.Using the more complex approach will ensure that the FastQ file with the forward reads and the FastQ file with the reverse reads of a paired-end sequencing run will have the same number of reads.
The text was updated successfully, but these errors were encountered: