Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract_kraken_reads with --exclude and --include-children on multiple taxids #30

Open
mhyleung opened this issue Jan 19, 2021 · 7 comments

Comments

@mhyleung
Copy link

Dear all

It does not appear to be able to submit multiple species-level taxids for the extract_kraken_reads command, while having both the --exclude and --include-children options to remove the indicated taxids, as well as its strain-level offsprings. Is there a get around to this if I want to remove reads of multiple species and all the strains from the kraken output?

Thanks

Marcus

@jenniferlu717
Copy link
Owner

Ohhh i did not think of this use case.

I can modify the script to allow multiple taxids with those options but it will take a day or so.

Without changes, you can run it twice to get the reads excluding one species at a time but the code isnt the fastest

@mhyleung
Copy link
Author

Hi thanks for the quick response. I would greatly appreciate if the script can be changed to reflect the expanded option. The reason for it is that we are trying to remove a large list of species-level taxids, and there are multiple strains within this list that we would like to remove, so running the script multiple times would be more time consuming than say, have the script updated :)

Thanks again for your help!

Regards

Marcus

@jenniferlu717
Copy link
Owner

Hi @mhyleung i double checked the code and it looks like it is set up to handle multiple taxids with --include-children and --exclude.

What is the behavior of the script when you try to specify those options?

@mhyleung
Copy link
Author

Hey!

So this is my command and error:

extract_kraken_reads.py -k [sample].krk -s1 [sample]_1.fastq -s2 [sample]_2.fastq -o [sample]_extract_1.fastq -o2 [sample]_extract_2.fastq --taxid 1164002 2768834 2743575 2664899 2663009 2652307 2610896 2598453 2576376 2563897 2545799 2545797 2530390 2518971 2506452 2492396 2488819 2487892 2487072 2487071 2479767 2382163 2109692 --exclude --include-children

And the error:

ERROR: --report not specified.(base)

I guess I can just add a --report option and indicate a file?

PS: Now that we are on about this, do you think you can also enable the upload of a taxid list file? We had another analysis where we had about ~250 species-level taxids that we would like to remove from the dataset (And their strains). We were trying to see if we can upload a text file of the list of taxids, but it seemed like we could only list them out as part of the command?

Thank you so much!

Marc

@jenniferlu717
Copy link
Owner

Yes you need to add a kraken report file (--report myfile.kreport)

I only have it set so you need to list them out as part of the command. It may be possible to modify the code to take a text file of taxids but that is a bit more specialized and I dont think it would make sense to make that a function of the default extract_kraken_reads code. I can make a specialized copy of the code that can allow that if you want?

@mhyleung
Copy link
Author

that would be great! Thanks for your help!

@dnolin13
Copy link

dnolin13 commented Mar 8, 2022

I would also love a copy of this code - I am in a similar situation where I have a list of taxids in a text file that I want to pull out of my fastq file. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants