Numerous undetected adaptors in B,C,D cats #11

MichaelFokinNZ · 2014-06-12T09:47:21Z

Richard hi!
I've decided to start new issue, just to share more info about undetected adaptors.
I am working with MiSeq reads 300bp, my pipeline is
(raw data -> nextclip -> fastq-mcf -> blastn) last two steps are to check if any adaptors still present and finally I'am checking these cases manually in Geneious.
"A" files almost doesn't suffer from junction adaptors - there are left <30/1M reads (fastq-mcf), and I haven't inspected this in details.
"B" and "C" files look worse :( there are from hundreds to 23k adaptors per 1M reads detected by fastq-mcf, and mention that this software is able to detect end/start adaptors only, not from inside the sequence, so really there are more.
I've analysed in details some of these files and found that:

Only few (dozens) duplicated adaptors left - all cases have 1 nucleotide indel in the junction site
There are plenty of single adaptors with 100% hit to the read, both is terminal and inside positions.... :( I would say few thousands per 1M reads and more partial adaptors less than 18 nucleotides.
I have not analysed read pairs yet.

I have no experience/ideas could it affect de-novo assembly, but will try not to avoid using B,C categories.

richardmleggett · 2014-06-12T13:53:12Z

Hi,

Thanks for this.

Are you saying that there are category B & C cases where, after processing with NextClip, there are all 19 bases of the junction adaptor present? If so, if you could send me a file of example reads (e.g a few hundred reads) before processing, then I will try and work out what is going on…

For de novo assembly, we would use categories A, B and C, but leave out D.

Thanks,
Richard

On 12 Jun 2014, at 10:47, MikhailFokinNZ <[email protected]mailto:[email protected]> wrote:

Richard hi!
I've decided to start new issue, just to share more info about undetected adaptors.
I am working with MiSeq reads 300bp, my pipeline is
(raw data -> nextclip -> fastq-mcf -> blastn) last two steps are to check if any adaptors still present and finally I'am checking these cases manually in Geneious.
"A" files almost doesn't suffer from junction adaptors - there are left <30/1M reads (fastq-mcf), and I haven't inspected this in details.
"B" and "C" files look worse :( there are from hundreds to 23k adaptors per 1M reads detected by fastq-mcf, and mention that this software is able to detect end/start adaptors only, not from inside the sequence, so really there are more.
I've analysed in details some of these files and found that:

Only few (dozens) duplicated adaptors left - all cases have 1 nucleotide indel in the junction site
There are plenty of single adaptors with 100% hit to the read, both is terminal and inside positions.... :( I would say few thousands per 1M reads and more partial adaptors less than 18 nucleotides.
I have not analysed read pairs yet.

I have no experience/ideas could it affect de-novo assembly, but will try not to avoid using B,C categories.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/11.

MesutOezil · 2014-06-12T16:34:59Z

Hi all,

I have been using NextClip for a few different species' genomes and for several different mate interval ranges. But, I have never found cases like this - undetected/remaining internal junction adaptors.

For your info, the latest version of FASTQC (released in early this month; http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) can detect Nextera junction adaptors in reads and report them to you, in the panel 'Adaptor Content'. This new function is very useful in evaluating mate-pair read properties and confirming removal of adaptors after running NextClip. I hope this helps you,, too.

Best regards,

Shigehiro

(2014/06/12 22:53), Richard Leggett wrote:

Hi,

Thanks for this.

Are you saying that there are category B & C cases where, after processing with NextClip, there are all 19 bases of the junction adaptor present? If so, if you could send me a file of example reads (e.g a few hundred reads) before processing, then I will try and work out what is going on…

For de novo assembly, we would use categories A, B and C, but leave out D.

Thanks,
Richard

On 12 Jun 2014, at 10:47, MikhailFokinNZ <[email protected]mailto:[email protected]> wrote:

Richard hi!
I've decided to start new issue, just to share more info about undetected adaptors.
I am working with MiSeq reads 300bp, my pipeline is
(raw data -> nextclip -> fastq-mcf -> blastn) last two steps are to check if any adaptors still present and finally I'am checking these cases manually in Geneious.
"A" files almost doesn't suffer from junction adaptors - there are left <30/1M reads (fastq-mcf), and I haven't inspected this in details.
"B" and "C" files look worse :( there are from hundreds to 23k adaptors per 1M reads detected by fastq-mcf, and mention that this software is able to detect end/start adaptors only, not from inside the sequence, so really there are more.
I've analysed in details some of these files and found that:

Only few (dozens) duplicated adaptors left - all cases have 1 nucleotide indel in the junction site

There are plenty of single adaptors with 100% hit to the read, both is terminal and inside positions.... :( I would say few thousands per 1M reads and more partial adaptors less than 18 nucleotides.

I have not analysed read pairs yet.

I have no experience/ideas could it affect de-novo assembly, but will try not to avoid using B,C categories.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/11.

—
Reply to this email directly or view it on GitHub #11 (comment).

richardmleggett · 2014-06-12T16:44:51Z

Thanks.

I meant to say previously that don’t forget you can adjust the match parameters with --strict_match

Thanks,
Richard

On 12 Jun 2014, at 17:34, MesutOezil <[email protected]mailto:[email protected]> wrote:

Hi all,

I have been using NextClip for a few different species' genomes and for several different mate interval ranges. But, I have never found cases like this - undetected/remaining internal junction adaptors.

For your info, the latest version of FASTQC (released in early this month; http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) can detect Nextera junction adaptors in reads and report them to you, in the panel 'Adaptor Content'. This new function is very useful in evaluating mate-pair read properties and confirming removal of adaptors after running NextClip. I hope this helps you,, too.

Best regards,

Shigehiro

(2014/06/12 22:53), Richard Leggett wrote:

Hi,

Thanks for this.

Are you saying that there are category B & C cases where, after processing with NextClip, there are all 19 bases of the junction adaptor present? If so, if you could send me a file of example reads (e.g a few hundred reads) before processing, then I will try and work out what is going on…

For de novo assembly, we would use categories A, B and C, but leave out D.

Thanks,
Richard

On 12 Jun 2014, at 10:47, MikhailFokinNZ <[email protected]mailto:[email protected]mailto:[email protected]> wrote:

Richard hi!
I've decided to start new issue, just to share more info about undetected adaptors.
I am working with MiSeq reads 300bp, my pipeline is
(raw data -> nextclip -> fastq-mcf -> blastn) last two steps are to check if any adaptors still present and finally I'am checking these cases manually in Geneious.
"A" files almost doesn't suffer from junction adaptors - there are left <30/1M reads (fastq-mcf), and I haven't inspected this in details.
"B" and "C" files look worse :( there are from hundreds to 23k adaptors per 1M reads detected by fastq-mcf, and mention that this software is able to detect end/start adaptors only, not from inside the sequence, so really there are more.
I've analysed in details some of these files and found that:

Only few (dozens) duplicated adaptors left - all cases have 1 nucleotide indel in the junction site

There are plenty of single adaptors with 100% hit to the read, both is terminal and inside positions.... :( I would say few thousands per 1M reads and more partial adaptors less than 18 nucleotides.

I have not analysed read pairs yet.

I have no experience/ideas could it affect de-novo assembly, but will try not to avoid using B,C categories.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/11.

—
Reply to this email directly or view it on GitHub #11 (comment).

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/11#issuecomment-45915665.

MichaelFokinNZ · 2014-06-16T00:37:46Z

Thanks guys!
I will analyse my data more precisely (new FastQC is awesome!) and provide you some files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numerous undetected adaptors in B,C,D cats #11

Numerous undetected adaptors in B,C,D cats #11

MichaelFokinNZ commented Jun 12, 2014

richardmleggett commented Jun 12, 2014

MesutOezil commented Jun 12, 2014

richardmleggett commented Jun 12, 2014

MichaelFokinNZ commented Jun 16, 2014

Numerous undetected adaptors in B,C,D cats #11

Numerous undetected adaptors in B,C,D cats #11

Comments

MichaelFokinNZ commented Jun 12, 2014

richardmleggett commented Jun 12, 2014

MesutOezil commented Jun 12, 2014

richardmleggett commented Jun 12, 2014

MichaelFokinNZ commented Jun 16, 2014