Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fastp does not remove G homopolymers #576

Open
Homap opened this issue Sep 26, 2024 · 1 comment
Open

Fastp does not remove G homopolymers #576

Homap opened this issue Sep 26, 2024 · 1 comment

Comments

@Homap
Copy link

Homap commented Sep 26, 2024

Hello,

In my Illumina NovaSeq read, I have many G and C homopolymer reads. I used fastp --trim_poly_g option.

However, this option detects reads with at least 10 Gs at the end and trims the 10 Gs. If the whole read is made up of Gs, those reads still stay there but will only be 10 base pairs shorted. In addition, if G homopolymers appear in the middle of reads, this filtering option does not remove them.

I can easily imagine to write a python script to filter reads based on GC% but given I have 300 million reads, it will probably take forever to finish the job.

Is there any way you would suggest for doing this filtering in an efficient way?

Screenshot 2024-09-26 at 09 43 17 Screenshot 2024-09-26 at 09 43 38
@neil-n-zhang
Copy link

I had the similar problem recently. One workaround is that you can turn off other trimming such as adapter trimming, and pairing polyG trimming with reads length filter (--length_required). In that way, you can discard reads with polyG, and the leftover reads will have little polyG.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants