-
Notifications
You must be signed in to change notification settings - Fork 15
Utility program for extracting sequences from a fasta/fastq file
License
bcthomas/pullseq
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Summary: Software to extract sequence from a fasta or fastq. Also filter sequences by a minimum length or maximum length. Fast, written in C, using kseq.h library. Pullseq Summary: pullseq - extract sequences from a fasta/fastq file. This program is fast, and can be useful in a variety of situations. You can use it to extract sequences from one fasta/fastq file into a new file, given either a list of header ids to include or a regular expression pattern to match. Results can be included (default) or excluded, and they can additionally be filtered with minimum / maximum sequence lengths. Additionally, it can convert from fastq to fasta or visa-versa and can change the length of the output sequence lines. NOTE: pullseq prints to standard out, so you need to use redirection (e.g. pullseq input.fasta -m 10 *>* output.fasta ) to create output files. Synopsis: pullseq -i <input fasta/fastq file> -n <header names to select> pullseq -i <input fasta/fastq file> -m <minimum sequence length> pullseq -i <input fasta/fastq file> -g <regex name to match> pullseq -i <input fasta/fastq file> -m <minimum sequence length> -a <max sequence length> pullseq -i <input fasta/fastq file> -t cat <names to select from STDIN> | pullseq -i <input fasta/fastq file> -N Options: -i, --input, Input fasta/fastq file (required) -n, --names, File of header id names to search for -N, --names_stdin, Use STDIN for header id names -g, --regex, Regular expression to match (PERL compatible; always case-insensitive) -m, --min, Minimum sequence length -a, --max, Maximum sequence length -l, --length, Sequence characters per line (default 50) -c, --convert, Convert input to fastq/fasta (e.g. if input is fastq, output will be fasta) -q, --quality, ASCII code to use for fasta->fastq quality conversions -e, --excluded, Exclude the header id names in the list (-n) -t, --count, Just count the possible output, but don't write it -h, --help, Display this help and exit -v, --verbose, Print extra details during the run --version, Output version information and exit =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Seqdiff Summary: seqdiff - compare two fasta (or fastq) files to determine overlap of sequences. This overlap can be at the sequence level (are two sequences exactly the same in both files?) or at the header name level (do two sequences contain the same header name between the two files?). Synopsis: seqdiff -1 first_file.fa -2 second_file.fa Usage: seqdiff -1 <first input fasta/fastq file> -2 <second fasta/fastq file> Options: -1, --first, First sequence file (required) -2, --second, Second sequence file (required) -a, --a_output, File name for uniques from first file -b, --b_output, File name for uniques from second file -c, --c_output, File name for common entries -d, --headers, Compare headers instead of sequences (default: false) -s, --summary, Just show summary stats? (default: false) -h, --help, Display this help and exit -v, --verbose, Print extra details during the run --version, Output version information and exit REQUIREMENTS: Pullseq/Seqdiff require a C compiler and has been tested to work with either GCC or clang. They also require (and include) kseq.h (Heng Li) and uthash.h (http://troydhanson.github.com/uthash/). kseq.h also requires Zlib (so your linker should be able to handle the '-lz' option). You can obtain zlib from http://www.zlib.net/ or commonly from your OS package manager (e.g. apt-get zlib or emerge zlib). NEW INSTALL: Pullseq uses CMake, so you must have CMake installed on your system. git clone: https://github.com/bcthomas/pullseq.git cd pullseq mkdir build cd build cmake .. This will build binaries in build/src/ > build/src/pullseq > build/src/seqdiff OLD INSTALL: To install, do the following in a shell on your system... From Git: git clone https://github.com/bcthomas/pullseq.git # checkout the code using git cd pullseq ./bootstrap # get set up for config/build after cloning ./configure # configure the application based on your system make # will build the application make install # will install in /usr/local by default From a Release file (tar or zip): tar xvf pullseq_version.tar.gz cd pullseq_version ./autoconf # make sure configuration is set ./configure # configure the application based on your system make # will build the application make install # will install in /usr/local by default NOTE: If you have PCRE (perl-compatible regular expression library) installed in a non-standard location (e.g. on a mac using brew), the ./configure script will fail. You'll need to update your CFLAGS and LDFLAGS env settings to define where your PCRE library files were installed. For example, on a mac with pcre installed by brew, you can do this: pcre-config --cflags -I/usr/local/Cellar/pcre/8.39/include Then you can just add this to a env CFLAGS variable and run the configure command, like so... export CFLAGS="-I/usr/local/Cellar/pcre/8.39/include" ./configure If your pcre library is installed somewhere else, you just update the CFLAGS env variable accordingly.
About
Utility program for extracting sequences from a fasta/fastq file
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published