You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a paired-end single-cell RNA-seq dataset. R1 consists of all the reads, and R2 of the barcodes necessary to identify which cell a read belongs to. If I now only trim R1 to keep high-quality reads, my R1 and R2 are out of sync..
It seems like pyfastx can solve this problem for me, by simply only keeping the reads in R2 of which we have one in R1:
import gzip
with gzip.open(output, 'wt') as f:
for read in reads:
barcode = barcodes[read.id]
f.write(barcode.raw)
However from a really sloppy benchmark, it seems like just getting barcode.raw takes around 0.0015 seconds per read, the lookup of the read is fast: 1e-6. This is would mean I have to wait two days to filter my fastq. Is there an easier/better/faster way of doing this?
The text was updated successfully, but these errors were encountered:
For your case, barcodes[read.id] means random access to read from file with given id. This step will firstly extract read information from index file (a sqlite database file). This may be very slow when processing large numbers of reads.
I am a little confused that why do you use read.id rather than read.name to extract reads. That means your reads in this two files are synchronous.
If you use name to extract reads from another file, you can use multiple threads to speedup.
I have a paired-end single-cell RNA-seq dataset. R1 consists of all the reads, and R2 of the barcodes necessary to identify which cell a read belongs to. If I now only trim R1 to keep high-quality reads, my R1 and R2 are out of sync..
It seems like pyfastx can solve this problem for me, by simply only keeping the reads in R2 of which we have one in R1:
However from a really sloppy benchmark, it seems like just getting
barcode.raw
takes around 0.0015 seconds per read, the lookup of the read is fast: 1e-6. This is would mean I have to wait two days to filter my fastq. Is there an easier/better/faster way of doing this?The text was updated successfully, but these errors were encountered: