Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

processing paired reads #891

Open
pedres opened this issue Nov 27, 2024 · 0 comments
Open

processing paired reads #891

pedres opened this issue Nov 27, 2024 · 0 comments

Comments

@pedres
Copy link

pedres commented Nov 27, 2024

Hi,
I was searching here and in the manual but I am not sure how to proceed. In the manual says that "--paired option to kraken2 will indicate to kraken2 that the input files provided are paired read data, and data will be read from the pairs of files concurrently." So if I want to classify a sample with paired reads and I understand that I have to pass --paired flag. However, in the "Metagenome analysis using the Kraken software suite" for the microbiome protocol the kraken2 command has not the --paired flag as it appears in the pathogen protocol. When looking at the code on https://github.com/martin-steinegger/kraken-protocol none of the kraken2 commands have the --paired flags.
So, what would be the correct approach to process a set of paired reads?
In fact if I run:
kraken2 --db $DATABASE --memory-mapping --threads 20 --report krak_test/VCM180_paired.k2report --paired shotgun_NOVOG/VCM180_R1.fq.gz shotgun_NOVOG/VCM180_R2.fq.gz > krak_test/VCM180.kraken2
Loading database information... done.
68391295 sequences (20505.76 Mbp) processed in 390.736s (10501.9 Kseq/m, 3148.79 Mbp/m).
205168 sequences classified (0.30%)
68186127 sequences unclassified (99.70%)

kraken2 --db $DATABASE --memory-mapping --threads 20 --report krak_test/VCM180_notpaired.k2report shotgun_NOVOG/VCM180_R1.fq.gz shotgun_NOVOG/VCM180_R2.fq.gz > krak_test/VCM180.kraken2
Loading database information... done.
136782590 sequences (20505.76 Mbp) processed in 362.249s (22655.6 Kseq/m, 3396.41 Mbp/m).
197818 sequences classified (0.14%)
136584772 sequences unclassified (99.86%)

bracken -d $DATABASE -i krak_test/VCM180_notpaired.k2report -o krak_test/VCM180_notpaired.bracken -w krak_test/VCM180_notpaired.breport -r 150 -l S
bracken -d $DATABASE -i krak_test/VCM180_paired.k2report -o krak_test/VCM180_paired.bracken -w krak_test/VCM180_paired.breport -r 150 -l S

When not setting --paired flag kraken2 classifies the double amount of sequences (separately classify R1 and R2 fastq files) and this affects counts for k2report and bracken estimation

Thank you very much for your help
VCM180_notpaired_bracken.txt
VCM180_notpaired_k2report.txt
VCM180_paired_bracken.txt
VCM180_paired_k2report.txt

Manuel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant