Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mate pair linker removal #56

Open
hussius opened this issue Oct 26, 2011 · 8 comments
Open

Mate pair linker removal #56

hussius opened this issue Oct 26, 2011 · 8 comments
Assignees

Comments

@hussius
Copy link

hussius commented Oct 26, 2011

Need to include Paul's script for mate-pair linker removal - ONLY for mate-pair runs. Perhaps as part of general screening module together with PhiX, contamination + adapter screeners.

@brainstorm
Copy link

@CosteaPaul, can you give me some pointers to this script @hussius is referring to ?

@brainstorm
Copy link

For doing PhiX removal on paired-end reads (which will almost always be the case) rather than single-end, an example command would be

bowtie ---solexa1.3-quals --un sample_nophiX.fastq 

/bubo/nobackup/uppnex/reference/biodata/genomes/phiX174/phix/bowtie/phix -1 sample_1.fastq -2 sample_2.fastq /dev/null
Note! When doing it on already demultiplexed samples (which will have Illumina index 3), this is fine, but if done on the non-demultiplexed, whole-lane FASTQ files, where the intention is to deliver just those whole-lane files, the last 7 nucleotides (the barcode) should be removed before running this command. This can be done by writing a trivial script to remove 7 bases off the end of the sequence and quality lines in the FASTQ files.

@ghost ghost assigned hussius and brainstorm Nov 24, 2011
@brainstorm
Copy link

I'm thinking on adding a field on run_info.yaml "filter_out_gnomes" on a per-sample basis such as:

  multiplex:
  - barcode_id: '1'
    barcode_type: Illumina
    name: BAC11
    sequence: ATCACG
    filter_out_genomes: ecoli, phix

That way the get rid of the second scenario (whole lane), and we apply it to the samples that matter. @chapmanb, we spike on phiX on sample 3, that's why this feature is needed, but I think in this way it's better generalized. In addition, it can be eventually exposed as an additional field in the ngLIMS.

@chapmanb
Copy link

Roman;
That's a nice idea. Thanks for looking at this; let me know how I can help

@hussius
Copy link
Author

hussius commented Nov 24, 2011

"filter_out_gnomes" - was that typo inspired by the approaching Christmas mood? :-)

@brainstorm
Copy link

LOL X"D

Well, you never know what you might find in those magical FastQ files ;P

hussius [email protected] wrote:

"filter_out_gnomes" - was that typo inspired by the approaching Christmas mood? :-)


Reply to this email directly or view it on GitHub:
https://github.com/brainstorm/bcbb/issues/56#issuecomment-2868049

@hussius
Copy link
Author

hussius commented Jan 12, 2012

A better option for mate pair linker removal would be to use a modified version of Deloxer (http://genomes.sdsc.edu/downloads/deloxer/). The modified version is written by Ino DeBruijn and handles more cases than the original Deloxer. I'll ask him to put it on GitHub.

  • the PhiX removal still needs to be included in the pipeline.

@ghost ghost assigned b97pla Feb 28, 2012
vals pushed a commit that referenced this issue May 4, 2012
Fix for localization and parsing error
@brainstorm
Copy link

From Brad:

I'm actively trying to move into using other external programs upstream of bcbio-nextgen instead of coding
this directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants