To test different approaches for assembling genomes, I needed data with known microbial content. Only long reads were available, but I needed to test the algorithm on short paired-end reads. This script was written to create short reads from long reads.
🤗 I would appreciate feedback if you find bugs in the code or in the logic of the program. Please dm me (@KaterinaPantiuk) / Twitter
sample.fasta - input data example
cut_long_reads.py - step 1. Script for extracting short reads from long reads as drawn at scheme below
cut_zero_reads.py - step 2. Script to filter a pair of reads if one or both reads in the pair are shorter than 100 bp
Schematic representation of reads cutting:
Kateryna Pantiukh
This project is licensed under the MIT License