Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Support gzipped fastq files #31

Open
aboyher opened this issue May 8, 2020 · 4 comments
Open

Request: Support gzipped fastq files #31

aboyher opened this issue May 8, 2020 · 4 comments

Comments

@aboyher
Copy link

aboyher commented May 8, 2020

Requesting support for gzipped fastq files.

@ShaolinXU
Copy link

agree!
It would be much easier to incorporate SRAssembler into workflow if it support gzipped fastq file

@vpbrendel
Copy link
Member

Let me understand: Are you suggesting that within SRAssembler we run "gunzip" if the read input ends in *.gz? Instead of doing this outside the SRAssembler call in the workflow?

@ShaolinXU
Copy link

Could be in that way,
But it's better to support gzipped fastq files without the unzip process.
Does the indexing process of vmatch need fasta files? If this is the case, may be we can use some tools like seqkit split2 to split fastq directly into fasta file, that would be easier.

@gwct
Copy link

gwct commented Apr 17, 2024

Hello,
I'm just attempting to use SRAssembler and wanted to add that I also think support for compressed fastq files would be nice. My C++ is rusty, but I know in other languages its not necessary to explicitly run gunzip prior to running the program, but rather it is possible to check the compression as the file is read and then read the file conditionally with some library (maybe zlib for C++?).

But given that it works as is, I think just a note in the documentation saying that the fastq files need to be uncompressed would be helpful.


To be thorough, and in case anyone else tries to search for the error text, this is the output and error I see when I try to run on compressed fastq files using a libraries file (-l). This is likely also related to #33. This is also the same error I get when one of those files listed does not exist.

singularity run -e -B $(pwd) sra.sif -q sfGFP.fa -t dna -p SRAssembler/demo/SRAssembler.conf -o srasm-out-1 -l libraries.txt -r srasm-preprocess -A 1 -k 15:10:45 -s mouse
[2024-04-17 14:49:44] [INFO] SRAssembler v1.0.0 command: SRAssembler -q sfGFP.fa -t dna -p SRAssembler/demo/SRAssembler.conf -o srasm-out-1 -l libraries.txt -r srasm-preprocess -A 1 -k 15:10:45 -s mouse
[2024-04-17 14:49:44] [INFO] Total processors: 1
[2024-04-17 14:49:44] [INFO] We have 4 libraries
[2024-04-17 14:49:44] [INFO] library 1: B6_AON_S1_L001_R1_001.fastqlibrary
[2024-04-17 14:49:44] [INFO] insert size: 350
[2024-04-17 14:49:44] [INFO] left read: fastq/B6_AON_S1_L001_R1_001.fastq.gz
[2024-04-17 14:49:44] [INFO] right read: fastq/B6_AON_S1_L001_R2_001.fastq.gz
[2024-04-17 14:49:44] [INFO] reversed: 0
[2024-04-17 14:49:44] [INFO] Paired-end: 1
[2024-04-17 14:49:44] [INFO] library 2: B6_AON_S1_L002_R1_001.fastqlibrary
[2024-04-17 14:49:44] [INFO] insert size: 350
[2024-04-17 14:49:44] [INFO] left read: fastq/B6_AON_S1_L002_R1_001.fastq.gz
[2024-04-17 14:49:44] [INFO] right read: fastq/B6_AON_S1_L002_R2_001.fastq.gz
[2024-04-17 14:49:44] [INFO] reversed: 0
[2024-04-17 14:49:44] [INFO] Paired-end: 1
[2024-04-17 14:49:44] [INFO] library 3: B6_AON_S1_L003_R1_001.fastqlibrary
[2024-04-17 14:49:44] [INFO] insert size: 350
[2024-04-17 14:49:44] [INFO] left read: fastq/B6_AON_S1_L003_R1_001.fastq.gz
[2024-04-17 14:49:44] [INFO] right read: fastq/B6_AON_S1_L003_R2_001.fastq.gz
[2024-04-17 14:49:44] [INFO] reversed: 0
[2024-04-17 14:49:44] [INFO] Paired-end: 1
[2024-04-17 14:49:44] [INFO] library 4: B6_AON_S1_L004_R1_001.fastqlibrary
[2024-04-17 14:49:44] [INFO] insert size: 350
[2024-04-17 14:49:44] [INFO] left read: fastq/B6_AON_S1_L004_R1_001.fastq.gz
[2024-04-17 14:49:44] [INFO] right read: fastq/B6_AON_S1_L004_R2_001.fastq.gz
[2024-04-17 14:49:44] [INFO] reversed: 0
[2024-04-17 14:49:44] [INFO] Paired-end: 1
[2024-04-17 14:49:44][DOING] Now pre-processing the reads files ...
[2024-04-17 14:49:44][DOING] Splitting read library 1 ...
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 1) > this->size() (which is 0)
Command terminated by signal 6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants