pal_finder: fix 'fastq_subset.py' utility for big input Fastq files #80

pjbriggs · 2021-04-13T12:45:32Z

PR which updates the read counting in the fastq_subset.py utility used by the pal_finder tool so it can better deal with large Fastq files.

Without the fix the read counting attempts to read the entire Fastq into memory, resulting in the following error if the Fastq file is too big:

Traceback (most recent call last):
  File "XXXX/toolshed.g2.bx.psu.edu/repos/pjbriggs/pal_finder/52dbe2089d14/pal_finder/fastq_subset.py", line 133, in <module>
    nreads = count_reads(args.fastq_r1)
  File "XXXX/toolshed.g2.bx.psu.edu/repos/pjbriggs/pal_finder/52dbe2089d14/pal_finder/fastq_subset.py", line 70, in count_reads
    buf = fq.read()
MemoryError
FATAL ERROR pal_finder failed to complete successfully

The fix buffers the reading into chunks which should fit into memory and so avoid the error.

… Fastqs.

pjbriggs added the tool/pal_finder label Apr 13, 2021

tools/pal_finder: fix 'fastq_subset.py' utility when dealing with big…

ba3432f

… Fastqs.

pjbriggs force-pushed the pal_finder-fix-fastq_subset.py-for-big-files branch from b888abb to ba3432f Compare April 13, 2021 15:02

pjbriggs changed the base branch from master to pal_finder-0.02.04.9 April 21, 2021 10:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pal_finder: fix 'fastq_subset.py' utility for big input Fastq files #80

pal_finder: fix 'fastq_subset.py' utility for big input Fastq files #80

pjbriggs commented Apr 13, 2021

pal_finder: fix 'fastq_subset.py' utility for big input Fastq files #80

Are you sure you want to change the base?

pal_finder: fix 'fastq_subset.py' utility for big input Fastq files #80

Conversation

pjbriggs commented Apr 13, 2021