Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerous undetected adaptors in B,C,D cats #11

Open
MichaelFokinNZ opened this issue Jun 12, 2014 · 4 comments
Open

Numerous undetected adaptors in B,C,D cats #11

MichaelFokinNZ opened this issue Jun 12, 2014 · 4 comments

Comments

@MichaelFokinNZ
Copy link

Richard hi!
I've decided to start new issue, just to share more info about undetected adaptors.
I am working with MiSeq reads 300bp, my pipeline is
(raw data -> nextclip -> fastq-mcf -> blastn) last two steps are to check if any adaptors still present and finally I'am checking these cases manually in Geneious.
"A" files almost doesn't suffer from junction adaptors - there are left <30/1M reads (fastq-mcf), and I haven't inspected this in details.
"B" and "C" files look worse :( there are from hundreds to 23k adaptors per 1M reads detected by fastq-mcf, and mention that this software is able to detect end/start adaptors only, not from inside the sequence, so really there are more.
I've analysed in details some of these files and found that:

  1. Only few (dozens) duplicated adaptors left - all cases have 1 nucleotide indel in the junction site
  2. There are plenty of single adaptors with 100% hit to the read, both is terminal and inside positions.... :( I would say few thousands per 1M reads and more partial adaptors less than 18 nucleotides.
  3. I have not analysed read pairs yet.

I have no experience/ideas could it affect de-novo assembly, but will try not to avoid using B,C categories.

@richardmleggett
Copy link
Owner

Hi,

Thanks for this.

Are you saying that there are category B & C cases where, after processing with NextClip, there are all 19 bases of the junction adaptor present? If so, if you could send me a file of example reads (e.g a few hundred reads) before processing, then I will try and work out what is going on…

For de novo assembly, we would use categories A, B and C, but leave out D.

Thanks,
Richard

On 12 Jun 2014, at 10:47, MikhailFokinNZ <[email protected]mailto:[email protected]> wrote:

Richard hi!
I've decided to start new issue, just to share more info about undetected adaptors.
I am working with MiSeq reads 300bp, my pipeline is
(raw data -> nextclip -> fastq-mcf -> blastn) last two steps are to check if any adaptors still present and finally I'am checking these cases manually in Geneious.
"A" files almost doesn't suffer from junction adaptors - there are left <30/1M reads (fastq-mcf), and I haven't inspected this in details.
"B" and "C" files look worse :( there are from hundreds to 23k adaptors per 1M reads detected by fastq-mcf, and mention that this software is able to detect end/start adaptors only, not from inside the sequence, so really there are more.
I've analysed in details some of these files and found that:

  1. Only few (dozens) duplicated adaptors left - all cases have 1 nucleotide indel in the junction site
  2. There are plenty of single adaptors with 100% hit to the read, both is terminal and inside positions.... :( I would say few thousands per 1M reads and more partial adaptors less than 18 nucleotides.
  3. I have not analysed read pairs yet.

I have no experience/ideas could it affect de-novo assembly, but will try not to avoid using B,C categories.


Reply to this email directly or view it on GitHubhttps://github.com//issues/11.

@MesutOezil
Copy link

Hi all,

I have been using NextClip for a few different species' genomes and for several different mate interval ranges. But, I have never found cases like this - undetected/remaining internal junction adaptors.

For your info, the latest version of FASTQC (released in early this month; http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) can detect Nextera junction adaptors in reads and report them to you, in the panel 'Adaptor Content'. This new function is very useful in evaluating mate-pair read properties and confirming removal of adaptors after running NextClip. I hope this helps you,, too.

Best regards,

Shigehiro

(2014/06/12 22:53), Richard Leggett wrote:

Hi,

Thanks for this.

Are you saying that there are category B & C cases where, after processing with NextClip, there are all 19 bases of the junction adaptor present? If so, if you could send me a file of example reads (e.g a few hundred reads) before processing, then I will try and work out what is going on…

For de novo assembly, we would use categories A, B and C, but leave out D.

Thanks,
Richard

On 12 Jun 2014, at 10:47, MikhailFokinNZ <[email protected]mailto:[email protected]> wrote:

Richard hi!
I've decided to start new issue, just to share more info about undetected adaptors.
I am working with MiSeq reads 300bp, my pipeline is
(raw data -> nextclip -> fastq-mcf -> blastn) last two steps are to check if any adaptors still present and finally I'am checking these cases manually in Geneious.
"A" files almost doesn't suffer from junction adaptors - there are left <30/1M reads (fastq-mcf), and I haven't inspected this in details.
"B" and "C" files look worse :( there are from hundreds to 23k adaptors per 1M reads detected by fastq-mcf, and mention that this software is able to detect end/start adaptors only, not from inside the sequence, so really there are more.
I've analysed in details some of these files and found that:

  1. Only few (dozens) duplicated adaptors left - all cases have 1 nucleotide indel in the junction site
  2. There are plenty of single adaptors with 100% hit to the read, both is terminal and inside positions.... :( I would say few thousands per 1M reads and more partial adaptors less than 18 nucleotides.
  3. I have not analysed read pairs yet.

I have no experience/ideas could it affect de-novo assembly, but will try not to avoid using B,C categories.


Reply to this email directly or view it on GitHubhttps://github.com//issues/11.


Reply to this email directly or view it on GitHub #11 (comment).

@richardmleggett
Copy link
Owner

Thanks.

I meant to say previously that don’t forget you can adjust the match parameters with --strict_match

Thanks,
Richard

On 12 Jun 2014, at 17:34, MesutOezil <[email protected]mailto:[email protected]> wrote:

Hi all,

I have been using NextClip for a few different species' genomes and for several different mate interval ranges. But, I have never found cases like this - undetected/remaining internal junction adaptors.

For your info, the latest version of FASTQC (released in early this month; http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) can detect Nextera junction adaptors in reads and report them to you, in the panel 'Adaptor Content'. This new function is very useful in evaluating mate-pair read properties and confirming removal of adaptors after running NextClip. I hope this helps you,, too.

Best regards,

Shigehiro

(2014/06/12 22:53), Richard Leggett wrote:

Hi,

Thanks for this.

Are you saying that there are category B & C cases where, after processing with NextClip, there are all 19 bases of the junction adaptor present? If so, if you could send me a file of example reads (e.g a few hundred reads) before processing, then I will try and work out what is going on…

For de novo assembly, we would use categories A, B and C, but leave out D.

Thanks,
Richard

On 12 Jun 2014, at 10:47, MikhailFokinNZ <[email protected]mailto:[email protected]mailto:[email protected]> wrote:

Richard hi!
I've decided to start new issue, just to share more info about undetected adaptors.
I am working with MiSeq reads 300bp, my pipeline is
(raw data -> nextclip -> fastq-mcf -> blastn) last two steps are to check if any adaptors still present and finally I'am checking these cases manually in Geneious.
"A" files almost doesn't suffer from junction adaptors - there are left <30/1M reads (fastq-mcf), and I haven't inspected this in details.
"B" and "C" files look worse :( there are from hundreds to 23k adaptors per 1M reads detected by fastq-mcf, and mention that this software is able to detect end/start adaptors only, not from inside the sequence, so really there are more.
I've analysed in details some of these files and found that:

  1. Only few (dozens) duplicated adaptors left - all cases have 1 nucleotide indel in the junction site
  2. There are plenty of single adaptors with 100% hit to the read, both is terminal and inside positions.... :( I would say few thousands per 1M reads and more partial adaptors less than 18 nucleotides.
  3. I have not analysed read pairs yet.

I have no experience/ideas could it affect de-novo assembly, but will try not to avoid using B,C categories.


Reply to this email directly or view it on GitHubhttps://github.com//issues/11.


Reply to this email directly or view it on GitHub #11 (comment).


Reply to this email directly or view it on GitHubhttps://github.com//issues/11#issuecomment-45915665.

@MichaelFokinNZ
Copy link
Author

Thanks guys!
I will analyse my data more precisely (new FastQC is awesome!) and provide you some files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants