Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty fields in Kraken2 report file generated with version 2.1.3 and PlusPF database break Bracken downstream analysis #888

Open
Seb-vb opened this issue Nov 12, 2024 · 3 comments

Comments

@Seb-vb
Copy link

Seb-vb commented Nov 12, 2024

Hello,

I believe this issue is the same one as in #883. I hope to add some relevant details to make the problem clearer.

I am using Kraken2 version 2.1.3 with the PlusPF database(Sept 2024 version) and am encountering a problem with the generated report file (example below). This problem affects downstream analysis as Bracken is unable to process the report file without manual correction.

The fourth column is supposed to contain an abbreviation specifying the taxonomic level of each taxon. However some cells are empty, such as for "root" and "Bacteria". They should contain the letters "R" and "D" respectively. A few other cells, such as the one for "cellular organisms" in this example only contain a digit, when the value should be "R1". This problem occurs a few times in the whole report, consistently with Domains, and sporadically in other taxonomic levels

1 2 3 4 5 6
35.93 2548128 2548128 U 0 unclassified
64.07 4542966 6838   1 root
62.13 4405890 999 1 131567 cellular organisms
61.73 4377315 62068   2 Bacteria
28.7 2034982 33109 D1 1783272 Terrabacteria group
26.75 1896932 4877 P 1239 Bacillota
13.33 945053 377 C 909932 Negativicutes

As mentioned, Bracken crashes when using the incomplete report. It however works correctly after manual correction.

I encountered this problem when running the program on different samples from different sources so I'm fairly confident they aren't the cause. I'm not sure however if this problem comes from Kraken2 itself, the PlusPF database, or an error on my side. What are your thoughts ?

I hope this was clear enough, please tell me if any other details can be useful.

@Seb-vb Seb-vb changed the title Empty fields in Kraken2 report file generated with version 2.1.3 and PlusPF database Empty fields in Kraken2 report file generated with version 2.1.3 and PlusPF database break Bracken downstream analysis Nov 19, 2024
@Xiang-Leo
Copy link

I met the same problem when I used the standard databse. I tried to add the Domain mannuly but there are a lot of species which should be put different family or order, were put "D".

@Xiang-Leo
Copy link

After check, I confirmed that the newest version will generate this kind of result. Using conda to install version 2.0.8 will resolve all problems.

@Seb-vb
Copy link
Author

Seb-vb commented Nov 28, 2024

I didn't think to check previous versions of Kraken2, thanks for the information @Xiang-Leo.

On my side I made a script to fix the 2.1.3 k2 report by assigning the correct taxonomic level when needed. It uses the "ktaxonomy.tsv" file that is included with the pre-built databases which very conveniently contains the abbreviated taxonomic levels of every taxon of the database.

The best workaround right now does seem to simply use version 2.0.8. Hopefully this error will be possible to fix for the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants