Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pal_finder: fix 'fastq_subset.py' utility for big input Fastq files #80

Open
wants to merge 1 commit into
base: pal_finder-0.02.04.9
Choose a base branch
from

Conversation

pjbriggs
Copy link
Member

PR which updates the read counting in the fastq_subset.py utility used by the pal_finder tool so it can better deal with large Fastq files.

Without the fix the read counting attempts to read the entire Fastq into memory, resulting in the following error if the Fastq file is too big:

Traceback (most recent call last):
  File "XXXX/toolshed.g2.bx.psu.edu/repos/pjbriggs/pal_finder/52dbe2089d14/pal_finder/fastq_subset.py", line 133, in <module>
    nreads = count_reads(args.fastq_r1)
  File "XXXX/toolshed.g2.bx.psu.edu/repos/pjbriggs/pal_finder/52dbe2089d14/pal_finder/fastq_subset.py", line 70, in count_reads
    buf = fq.read()
MemoryError
FATAL ERROR pal_finder failed to complete successfully

The fix buffers the reading into chunks which should fit into memory and so avoid the error.

@pjbriggs pjbriggs force-pushed the pal_finder-fix-fastq_subset.py-for-big-files branch from b888abb to ba3432f Compare April 13, 2021 15:02
@pjbriggs pjbriggs changed the base branch from master to pal_finder-0.02.04.9 April 21, 2021 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant