diff --git a/topics/sequence-analysis/faqs/quality_score.md b/topics/sequence-analysis/faqs/quality_score.md index f59e5f8fa0e841..557a78ef817368 100644 --- a/topics/sequence-analysis/faqs/quality_score.md +++ b/topics/sequence-analysis/faqs/quality_score.md @@ -3,7 +3,7 @@ title: Quality Scores area: format box_type: details layout: faq -contributors: [bebatut, hexylena] +contributors: [bebatut, nakucher, hexylena] --- But what does this quality score mean? diff --git a/topics/sequence-analysis/tutorials/quality-control/tutorial.md b/topics/sequence-analysis/tutorials/quality-control/tutorial.md index 060d7e82b68a6f..c7a12eb58fb596 100644 --- a/topics/sequence-analysis/tutorials/quality-control/tutorial.md +++ b/topics/sequence-analysis/tutorials/quality-control/tutorial.md @@ -112,22 +112,7 @@ GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGFGGGGGGAFFGGFGG It means that the fragment named `@M00970` corresponds to the DNA sequence `GTGCCAGCCGCCGCGGTAGTCCGACGTGGCTGTCTCTTATACACATCTCCGAGCCCACGAGACCGAAGAACATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAAAAGAAGCAAATGACGATTCAAGAAAGAAAAAAACACAGAATACTAACAATAAGTCATAAACATCATCAACATAAAAAAGGAAATACACTTACAACACATATCAATATCTAAAATAAATGATCAGCACACAACATGACGATTACCACACATGTGTACTACAAGTCAACTA` and this sequence has been sequenced with a quality `GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGFGGGGGGAFFGGFGGGGGGGGFGGGGGGGGGGGGGGFGGG+38+35*311*6,,31=******441+++0+0++0+*1*2++2++0*+*2*02*/***1*+++0+0++38++00++++++++++0+0+2++*+*+*+*+*****+0**+0**+***+)*.***1**//*)***)/)*)))*)))*),)0(((-((((-.(4(,,))).,(())))))).)))))))-))-(`. -But what does this quality score mean? - -The quality score for each sequence is a string of characters, one for each base of the nucleic sequence, used to characterize the probability of mis-identification of each base. The score is encoded using the ASCII character table (with [some historical differences](https://en.wikipedia.org/wiki/FASTQ_format#Encoding)): - -![Encoding of the quality score with ASCII characters for different Phred encoding. The ascii code sequence is shown at the top with symbols for 33 to 64, upper case letters, more symbols, and then lowercase letters. Sanger maps from 33 to 73 while solexa is shifted, starting at 59 and going to 104. Illumina 1.3 starts at 54 and goes to 104, Illumina 1.5 is shifted three scores to the right but still ends at 104. Illumina 1.8+ goes back to the Sanger except one single score wider. Illumina](../../../sequence-analysis/images/fastq-quality-encoding.png) - -So there is an ASCII character associated with each nucleotide, representing its [Phred quality score](https://en.wikipedia.org/wiki/Phred_quality_score), the probability of an incorrect base call: - -Phred Quality Score | Probability of incorrect base call | Base call accuracy ---- | --- | --- -10 | 1 in 10 | 90% -20 | 1 in 100 | 99% -30 | 1 in 1000 | 99.9% -40 | 1 in 10,000 | 99.99% -50 | 1 in 100,000 | 99.999% -60 | 1 in 1,000,000 | 99.9999% +{% snippet topics/sequence-analysis/faqs/quality_score.md %} > > diff --git a/topics/sequence-analysis/tutorials/sars-with-galaxy-on-anvil/tutorial.md b/topics/sequence-analysis/tutorials/sars-with-galaxy-on-anvil/tutorial.md index 8172583e7a5995..76f8fee558de00 100644 --- a/topics/sequence-analysis/tutorials/sars-with-galaxy-on-anvil/tutorial.md +++ b/topics/sequence-analysis/tutorials/sars-with-galaxy-on-anvil/tutorial.md @@ -259,18 +259,7 @@ You will open up a summary report for the sequencing file: > {: .solution} {: .question} -> Learn more about quality scores -> -> You may be wondering how the fourth line of the .fastq files relates to the quality score above. To save space, the sequencer records an [ASCII character](http://drive5.com/usearch/manual/quality_score.html) to represent scores 0-42. For example 10 corresponds to “+” and 40 corresponds to “I”. FastQC knows how to translate this. This is often called “Phred” scoring. -> What does 0-42 represent? These numbers, when plugged into a formula, tell us the probability of an error for that base. This is the formula, where Q is our quality score (0-42) and P is the probability of an error: -> ->Q = -10 log10(P) -> ->Using this formula, we can calculate that a quality score of 40 means only 0.00010 probability of an error! -> -> Learn more from the [Quality Control Tutorial FAQs](https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/faqs/). -> -{: .details} +{% snippet topics/sequence-analysis/faqs/quality_score.md %} # Exercise Three: Alignment