-
Notifications
You must be signed in to change notification settings - Fork 0
/
11th-diary.html
583 lines (567 loc) · 89.5 KB
/
11th-diary.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
<!DOCTYPE html>
<html lang="en">
<head>
<title>11th Diary</title>
<meta charset="utf-8" />
<link rel="stylesheet" href="/theme/css/main.css" type="text/css" />
<!--[if IE]>
<script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"></script><![endif]-->
<!--[if lte IE 7]>
<link rel="stylesheet" type="text/css" media="all" href="/css/ie.css"/>
<script src="/js/IE8.js" type="text/javascript"></script><![endif]-->
<!--[if lt IE 7]>
<link rel="stylesheet" type="text/css" media="all" href="/css/ie6.css"/><![endif]-->
</head>
<body id="index" class="home">
<header id="banner" class="body">
<h1><a href="/index.html">Codon Alignment GSoC Homepage </a></h1>
<nav><ul>
<li class="active"><a href="/category/gsoc.html">GSoC</a></li>
<li ><a href="/category/me.html">Me</a></li>
<li ><a href="/category/timeline.html">Timeline</a></li>
</ul></nav>
</header><!-- /#banner -->
<section id="content" class="body">
<article>
<header>
<h1 class="entry-title">
<a href="11th-diary.html" rel="bookmark"
title="Permalink to 11th Diary">11th Diary</a></h1>
</header>
<div class="entry-content">
<footer class="post-info">
<abbr class="published" title="2013-09-21T18:12:00">
Sat 21 September 2013
</abbr>
<address class="vcard author">
By <a class="url fn" href="/author/zheng-ruan.html">Zheng Ruan</a>
</address>
<p>In <a href="/category/gsoc.html">GSoC</a>. </p>
<p>tags: <a href="/tag/codon-alignment.html">Codon Alignment</a><a href="/tag/gsoc.html">GSoC</a></p></footer><!-- /.post-info --> <div class="section" id="things-have-been-done">
<h2>Things have been done</h2>
<p>This week I finished writing the test and documentation for <tt class="docutils literal">Bio.CodonAlign</tt>.
The test now covers all the aspects of <tt class="docutils literal">CodonAlign</tt>
(in <tt class="docutils literal">Tests/test_CodonAlign.py</tt>). Documentation for <tt class="docutils literal">CodonAlign</tt> can be found
in <tt class="docutils literal">Doc/doc*</tt>. It is available in <tt class="docutils literal">LaTeX</tt>, <tt class="docutils literal">pdf</tt>, <tt class="docutils literal">reStructuredText</tt> and
<tt class="docutils literal">HTML</tt> format. You may find the following links to be useful.</p>
<ul class="simple">
<li><a class="reference external" href="http://zruanweb.com/doc.html">HTML Format</a></li>
<li><a class="reference external" href="http://zruanweb.com/doc.rst">reStructuredText Format</a></li>
<li><a class="reference external" href="http://zruanweb.com/doc.pdf">PDF Version</a></li>
</ul>
</div>
<div class="section" id="documents">
<h2>Documents</h2>
<div class="section" id="chapter-xxx-codon-alignment">
<h3>Chapter XXX Codon Alignment</h3>
<p>This chapter is about Codon Alignments, which is a special case of nucleotide
alignment in which the trinucleotides correspond directly to amino acids in
the translated protein product. Codon Alignment carries information that can
be used for many evolutionary analysis.</p>
<p>This chapter has been divided into four parts to explain the codon alignment
support in Biopython. First, a general introduction about the basic classes
in <tt class="docutils literal">Bio.CodonAlign</tt> will be given. Then, a typical procedure of how to
obtain a codon alignment within Biopython is then discussed. Next, some
simple applications of codon alignment, such as dN/dS ratio estimation and
neutrality test and so forth will be covered. Finally, IO support of codon
alignment will help user to conduct analysis that cannot be done within
Biopython.</p>
<div class="section" id="x-1-codonseq-class">
<h4>X.1 CodonSeq Class</h4>
<p><tt class="docutils literal">Bio.CodonAlign.CodonSeq</tt> object is the base object in Codon Alignment. It
is similar to <tt class="docutils literal">Bio.Seq</tt> but with some extra attributes. To obtain a simple
<tt class="docutils literal">CodonSeq</tt> object, you just need to give a <tt class="docutils literal">str</tt> object of nucleotide
sequence whose length is a multiple of 3 (This can be violated if you have
<tt class="docutils literal">rf_table</tt> argument). For example:</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.CodonAlign</span> <span class="kn">import</span> <span class="n">CodonSeq</span>
<span class="o">>>></span> <span class="n">codon_seq</span> <span class="o">=</span> <span class="n">CodonSeq</span><span class="p">(</span><span class="s">"AAATTTCCCGGG"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">codon_seq</span>
<span class="n">CodonSeq</span><span class="p">(</span><span class="s">'AAATTTCCCGGG'</span><span class="p">,</span> <span class="n">Gapped</span><span class="p">(</span><span class="n">CodonAlphabet</span><span class="p">(),</span> <span class="s">'-'</span><span class="p">))</span>
</pre>
<p>An error will raise up if the input sequence is not a multiple of 3.</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="n">codon_seq</span> <span class="o">=</span> <span class="n">CodonSeq</span><span class="p">(</span><span class="s">"AAATTTCCCGG"</span><span class="p">)</span>
<span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
<span class="n">File</span> <span class="s">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="n">File</span> <span class="s">"/biopython/Bio/CodonAlign/CodonSeq.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">81</span><span class="p">,</span> <span class="ow">in</span> <span class="n">__init__</span>
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">%</span> <span class="mi">3</span> <span class="o">==</span> <span class="mi">0</span><span class="p">,</span> <span class="s">"Sequence length is not a triple number"</span>
<span class="ne">AssertionError</span><span class="p">:</span> <span class="n">Sequence</span> <span class="n">length</span> <span class="ow">is</span> <span class="ow">not</span> <span class="n">a</span> <span class="n">triple</span> <span class="n">number</span>
</pre>
<p>By default, <tt class="docutils literal">Bio.CodonAlign.default_codon_alphabet</tt> will be assigned to
<tt class="docutils literal">CodonSeq</tt> object if you don't specify any Alphabet. This
<tt class="docutils literal">default_codon_alphabet</tt> is gapped universal genetic code, which will work
in most cases. However, if you are analyzing data from mitochondria, for
instance, and are in need of assigning an special codon alphabet by yourself,
<tt class="docutils literal">Bio.CodonAlign.CodonAlphabet</tt> also provides you an easy solution. All you
need is to pick up a <tt class="docutils literal">CodonTable</tt> object that is correct for your data.
For example:</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.CodonAlign</span> <span class="kn">import</span> <span class="n">CodonSeq</span>
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.CodonAlign.CodonAlphabet</span> <span class="kn">import</span> <span class="n">get_codon_alphabet</span>
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.Data.CodonTable</span> <span class="kn">import</span> <span class="n">generic_by_id</span>
<span class="c"># vertebrate mitochondria alphabet</span>
<span class="o">>>></span> <span class="n">codon_alphabet</span> <span class="o">=</span> <span class="n">get_codon_alphabet</span><span class="p">(</span><span class="n">generic_by_id</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">gap_char</span><span class="o">=</span><span class="s">"-"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">codon_seq1</span> <span class="o">=</span> <span class="n">CodonSeq</span><span class="p">(</span><span class="s">"AAA---CCCGGG"</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">codon_alphabet</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">codon_seq1</span>
<span class="n">CodonSeq</span><span class="p">(</span><span class="s">'AAA---CCCGGG'</span><span class="p">,</span> <span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Vertebrate</span> <span class="n">Mitochondrial</span><span class="p">))</span>
</pre>
<p>The slice of <tt class="docutils literal">CodonSeq</tt> is exactly the same with <tt class="docutils literal">Seq</tt> and it will always
return a <tt class="docutils literal">Seq</tt> object if you sliced a <tt class="docutils literal">CodonSeq</tt>. For example:</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="n">codon_seq1</span>
<span class="n">CodonSeq</span><span class="p">(</span><span class="s">'AAA---CCCGGG'</span><span class="p">,</span> <span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Vertebrate</span> <span class="n">Mitochondrial</span><span class="p">))</span>
<span class="o">>>></span> <span class="n">codon_seq1</span><span class="p">[:</span><span class="mi">6</span><span class="p">]</span>
<span class="n">Seq</span><span class="p">(</span><span class="s">'AAA---'</span><span class="p">,</span> <span class="n">DNAAlphabet</span><span class="p">())</span>
<span class="o">>>></span> <span class="n">codon_seq1</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="mi">5</span><span class="p">]</span>
<span class="n">Seq</span><span class="p">(</span><span class="s">'AA--'</span><span class="p">,</span> <span class="n">DNAAlphabet</span><span class="p">())</span>
</pre>
<p>As you might imagine, <tt class="docutils literal">CodonSeq</tt> is able to be translated into amino acid
sequence based on the <tt class="docutils literal">CodonAlphabet</tt> within it. In fact, <tt class="docutils literal">CodonSeq</tt> does
more than this. <tt class="docutils literal">CodonSeq</tt> object has a <tt class="docutils literal">rf_table</tt> attribute that dictates
how the <tt class="docutils literal">CodonSeq</tt> will be translated (<tt class="docutils literal">rf_table</tt> will indicate the
starting position of each codon in the sequence). This is useful if you
sequence is known to have frameshift events or pseudogene that has insertion
or deletion. You might notice that in the previous example, you haven't
specify the <tt class="docutils literal">rf_table</tt> when initiate a <tt class="docutils literal">CodonSeq</tt> object. In fact,
<tt class="docutils literal">CodonSeq</tt> object will automatically assign a <tt class="docutils literal">rf_table</tt> to the
<tt class="docutils literal">CodonSeq</tt> if you don't say anything about it.</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="n">codon_seq1</span> <span class="o">=</span> <span class="n">CodonSeq</span><span class="p">(</span><span class="s">"AAACCCGGG"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">codon_seq1</span>
<span class="n">CodonSeq</span><span class="p">(</span><span class="s">'AAACCCGGG'</span><span class="p">,</span> <span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Standard</span><span class="p">))</span>
<span class="o">>>></span> <span class="n">codon_seq1</span><span class="o">.</span><span class="n">rf_table</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
<span class="o">>>></span> <span class="n">codon_seq1</span><span class="o">.</span><span class="n">translate</span><span class="p">()</span>
<span class="s">'KPG'</span>
<span class="o">>>></span> <span class="n">codon_seq2</span> <span class="o">=</span> <span class="n">CodonSeq</span><span class="p">(</span><span class="s">"AAACCCGG"</span><span class="p">,</span> <span class="n">rf_table</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span>
<span class="o">>>></span> <span class="n">codon_seq2</span><span class="o">.</span><span class="n">rf_table</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
<span class="o">>>></span> <span class="n">codon_seq2</span><span class="o">.</span><span class="n">translate</span><span class="p">()</span>
<span class="s">'KPR'</span>
</pre>
<p>In the example, we didn't assign <tt class="docutils literal">rf_table</tt> to <tt class="docutils literal">codon_seq1</tt>. By default,
<tt class="docutils literal">CodonSeq</tt> will automatically generate a <tt class="docutils literal">rf_table</tt> to the coding sequence
assuming no frameshift events. In this case, it is <tt class="docutils literal">[0, 3, 6]</tt>, which means
the first codon in the sequence starts at position 0, the second codon in the
sequence starts at position 3, and the third codon in the sequence starts at
position 6. In <tt class="docutils literal">codon_seq2</tt>, we only have 8 nucleotides in the sequence, but
with <tt class="docutils literal">rf_table</tt> option specified. In this case, the third codon starts at
the 5th position of the sequence rather than the 6th. And the <tt class="docutils literal">translate()</tt>
function will use the <tt class="docutils literal">rf_table</tt> to get the translated amino acid sequence.</p>
<p>Another thing to keep in mind is that <tt class="docutils literal">rf_table</tt> will only be applied to
ungapped nucleotide sequence. This makes <tt class="docutils literal">rf_table</tt> to be interchangeable
between <tt class="docutils literal">CodonSeq</tt> with the same sequence but different gaps inserted. For
example,</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="n">codon_seq1</span> <span class="o">=</span> <span class="n">CodonSeq</span><span class="p">(</span><span class="s">"AAACCC---GGG"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">codon_seq1</span><span class="o">.</span><span class="n">rf_table</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
<span class="o">>>></span> <span class="n">codon_seq1</span><span class="o">.</span><span class="n">translate</span><span class="p">()</span>
<span class="s">'KPG'</span>
<span class="o">>>></span> <span class="n">codon_seq1</span><span class="o">.</span><span class="n">full_translate</span><span class="p">()</span>
<span class="s">'KP-G'</span>
</pre>
<p>We can see that the <tt class="docutils literal">rf_table</tt> of <tt class="docutils literal">codon_seq1</tt> is still <tt class="docutils literal">[0, 3, 6]</tt>,
even though we have gaps added. The <tt class="docutils literal">translate()</tt> function will skip the
gaps and return the ungapped amino acid sequence. If gapped protein sequence
is what you need, <tt class="docutils literal">full_translate()</tt> comes to help.</p>
<p>It is also easy to convert <tt class="docutils literal">Seq</tt> object to <tt class="docutils literal">CodonSeq</tt> object, but it is
the user's responsibility to ensure all the necessary information is correct
for a <tt class="docutils literal">CodonSeq</tt> (mainly <tt class="docutils literal">rf_table</tt>).</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.Seq</span> <span class="kn">import</span> <span class="n">Seq</span>
<span class="o">>>></span> <span class="n">codon_seq</span> <span class="o">=</span> <span class="n">CodonSeq</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">seq</span> <span class="o">=</span> <span class="n">Seq</span><span class="p">(</span><span class="s">'AAAAAA'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">codon_seq</span><span class="o">.</span><span class="n">from_seq</span><span class="p">(</span><span class="n">seq</span><span class="p">)</span>
<span class="n">CodonSeq</span><span class="p">(</span><span class="s">'AAAAAA'</span><span class="p">,</span> <span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Standard</span><span class="p">))</span>
<span class="o">>>></span> <span class="n">seq</span> <span class="o">=</span> <span class="n">Seq</span><span class="p">(</span><span class="s">'AAAAA'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">codon_seq</span><span class="o">.</span><span class="n">from_seq</span><span class="p">(</span><span class="n">seq</span><span class="p">)</span>
<span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
<span class="n">File</span> <span class="s">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="n">File</span> <span class="s">"/biopython/Bio/CodonAlign/CodonSeq.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">264</span><span class="p">,</span> <span class="ow">in</span> <span class="n">from_seq</span>
<span class="k">return</span> <span class="n">cls</span><span class="p">(</span><span class="n">seq</span><span class="o">.</span><span class="n">_data</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">alphabet</span><span class="p">)</span>
<span class="n">File</span> <span class="s">"/biopython/Bio/CodonAlign/CodonSeq.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">80</span><span class="p">,</span> <span class="ow">in</span> <span class="n">__init__</span>
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">%</span> <span class="mi">3</span> <span class="o">==</span> <span class="mi">0</span><span class="p">,</span> <span class="s">"Sequence length is not a triple number"</span>
<span class="ne">AssertionError</span><span class="p">:</span> <span class="n">Sequence</span> <span class="n">length</span> <span class="ow">is</span> <span class="ow">not</span> <span class="n">a</span> <span class="n">triple</span> <span class="n">number</span>
<span class="o">>>></span> <span class="n">codon_seq</span><span class="o">.</span><span class="n">from_seq</span><span class="p">(</span><span class="n">seq</span><span class="p">,</span> <span class="n">rf_table</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span>
<span class="n">CodonSeq</span><span class="p">(</span><span class="s">'AAAAA'</span><span class="p">,</span> <span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Standard</span><span class="p">))</span>
</pre>
</div>
<div class="section" id="x-2-codonalignment-class">
<h4>X.2 CodonAlignment Class</h4>
<p>The <tt class="docutils literal">CodonAlignment</tt> class is another new class in <tt class="docutils literal">Codon.Align</tt>. It's
aim is to store codon alignment data and apply various analysis upon it.
Similar to <tt class="docutils literal">MultipleSeqAlignment</tt>, you can use numpy style slice to a
<tt class="docutils literal">CodonAlignment</tt>. However, once you sliced, the returned result will
always be a <tt class="docutils literal">MultipleSeqAlignment</tt> object.</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.CodonAlign</span> <span class="kn">import</span> <span class="n">default_codon_alphabet</span><span class="p">,</span> <span class="n">CodonSeq</span><span class="p">,</span> <span class="n">CodonAlignment</span>
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.Alphabet</span> <span class="kn">import</span> <span class="n">generic_dna</span>
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.SeqRecord</span> <span class="kn">import</span> <span class="n">SeqRecord</span>
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.Alphabet</span> <span class="kn">import</span> <span class="n">IUPAC</span><span class="p">,</span> <span class="n">Gapped</span>
<span class="o">>>></span> <span class="n">a</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">CodonSeq</span><span class="p">(</span><span class="s">"AAAACGTCG"</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">default_codon_alphabet</span><span class="p">),</span> <span class="nb">id</span><span class="o">=</span><span class="s">"Alpha"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">b</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">CodonSeq</span><span class="p">(</span><span class="s">"AAA---TCG"</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">default_codon_alphabet</span><span class="p">),</span> <span class="nb">id</span><span class="o">=</span><span class="s">"Beta"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">c</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">CodonSeq</span><span class="p">(</span><span class="s">"AAAAGGTGG"</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">default_codon_alphabet</span><span class="p">),</span> <span class="nb">id</span><span class="o">=</span><span class="s">"Gamma"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">codon_aln</span> <span class="o">=</span> <span class="n">CodonAlignment</span><span class="p">([</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">])</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">codon_aln</span>
<span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Standard</span><span class="p">)</span> <span class="n">CodonAlignment</span> <span class="k">with</span> <span class="mi">3</span> <span class="n">rows</span> <span class="ow">and</span> <span class="mi">9</span> <span class="n">columns</span> <span class="p">(</span><span class="mi">3</span> <span class="n">codons</span><span class="p">)</span>
<span class="n">AAAACGTCG</span> <span class="n">Alpha</span>
<span class="n">AAA</span><span class="o">---</span><span class="n">TCG</span> <span class="n">Beta</span>
<span class="n">AAAAGGTGG</span> <span class="n">Gamma</span>
<span class="o">>>></span> <span class="n">codon_aln</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">ID</span><span class="p">:</span> <span class="n">Alpha</span>
<span class="n">Name</span><span class="p">:</span> <span class="o"><</span><span class="n">unknown</span> <span class="n">name</span><span class="o">></span>
<span class="n">Description</span><span class="p">:</span> <span class="o"><</span><span class="n">unknown</span> <span class="n">description</span><span class="o">></span>
<span class="n">Number</span> <span class="n">of</span> <span class="n">features</span><span class="p">:</span> <span class="mi">0</span>
<span class="n">CodonSeq</span><span class="p">(</span><span class="s">'AAAACGTCG'</span><span class="p">,</span> <span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Standard</span><span class="p">))</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">codon_aln</span><span class="p">[:,</span> <span class="mi">3</span><span class="p">]</span>
<span class="n">A</span><span class="o">-</span><span class="n">A</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">codon_aln</span><span class="p">[</span><span class="mi">1</span><span class="p">:,</span> <span class="mi">3</span><span class="p">:</span><span class="mi">10</span><span class="p">]</span>
<span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Standard</span><span class="p">)</span> <span class="n">alignment</span> <span class="k">with</span> <span class="mi">2</span> <span class="n">rows</span> <span class="ow">and</span> <span class="mi">6</span> <span class="n">columns</span>
<span class="o">---</span><span class="n">TCG</span> <span class="n">Beta</span>
<span class="n">AGGTGG</span> <span class="n">Gamma</span>
</pre>
<p>You can write out <tt class="docutils literal">CodonAlignment</tt> object just as what you do with
<tt class="docutils literal">MultipleSeqAlignment</tt>.</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio</span> <span class="kn">import</span> <span class="n">AlignIO</span>
<span class="o">>>></span> <span class="n">AlignIO</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">codon_aln</span><span class="p">,</span> <span class="s">'example.aln'</span><span class="p">,</span> <span class="s">'clustal'</span><span class="p">)</span>
</pre>
<p>An alignment file called <tt class="docutils literal">example.aln</tt> can then be found in your current
working directory. You can write <tt class="docutils literal">CodonAlignment</tt> out in any MSA format
that Biopython supports.</p>
<p>Currently, you are not able to read MSA data as a <tt class="docutils literal">CodonAlignment</tt> object
directly (because of dealing with <tt class="docutils literal">rf_table</tt> issue for each sequence).
However, you can read the alignment data in as a <tt class="docutils literal">MultipleSeqAlignment</tt>
object and convert them into <tt class="docutils literal">CodonAlignment</tt> object using <tt class="docutils literal">from_msa()</tt>
class method. For example,</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="n">aln</span> <span class="o">=</span> <span class="n">AlignIO</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="s">'example.aln'</span><span class="p">,</span> <span class="s">'clustal'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">codon_aln</span> <span class="o">=</span> <span class="n">CodonAlignment</span><span class="p">()</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">codon_aln</span><span class="o">.</span><span class="n">from_msa</span><span class="p">(</span><span class="n">aln</span><span class="p">)</span>
<span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Standard</span><span class="p">)</span> <span class="n">CodonAlignment</span> <span class="k">with</span> <span class="mi">3</span> <span class="n">rows</span> <span class="ow">and</span> <span class="mi">9</span> <span class="n">columns</span> <span class="p">(</span><span class="mi">3</span> <span class="n">codons</span><span class="p">)</span>
<span class="n">AAAACGTCG</span> <span class="n">Alpha</span>
<span class="n">AAA</span><span class="o">---</span><span class="n">TCG</span> <span class="n">Beta</span>
<span class="n">AAAAGGTGG</span> <span class="n">Gamma</span>
</pre>
<p>Note, the <tt class="docutils literal">from_msa()</tt> method assume there is no frameshift events occurs
in your alignment. Its behavior is not guaranteed if your sequence contain
frameshift events!!</p>
<p>There is a couple of methods that can be applied to <tt class="docutils literal">CodonAlignment</tt> class
for evolutionary analysis. We will cover them more in X.4.</p>
</div>
<div class="section" id="x-3-build-a-codon-alignment">
<h4>X.3 Build a Codon Alignment</h4>
<p>Building a codon alignment is the first step of many evolutionary anaysis.
But how to do that? <tt class="docutils literal">Bio.CodonAlign</tt> provides you an easy funciton
<tt class="docutils literal">build()</tt> to achieve all. The data you need to prepare in advance is a
protein alignment and a set of DNA sequences that can be translated into the
protein sequences in the alignment.</p>
<p><tt class="docutils literal">CodonAlign.build</tt> method requires two mandatory arguments. The first one
should be a protein <tt class="docutils literal">MultipleSeqAlignment</tt> object and the second one is a
list of nucleotide <tt class="docutils literal">SeqRecord</tt> object. By default, <tt class="docutils literal">CodonAlign.build</tt>
assumes the order of the alignment and nucleotide sequences are in the same.
For example:</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio</span> <span class="kn">import</span> <span class="n">CodonAlign</span>
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.Alphabet</span> <span class="kn">import</span> <span class="n">IUPAC</span>
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.Align</span> <span class="kn">import</span> <span class="n">MultipleSeqAlignment</span>
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.SeqRecord</span> <span class="kn">import</span> <span class="n">SeqRecord</span>
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.Seq</span> <span class="kn">import</span> <span class="n">Seq</span>
<span class="o">>>></span> <span class="n">nucl1</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">Seq</span><span class="p">(</span><span class="s">'AAATTTCCCGGG'</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">IUPAC</span><span class="o">.</span><span class="n">IUPACUnambiguousDNA</span><span class="p">()),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'nucl1'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">nucl2</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">Seq</span><span class="p">(</span><span class="s">'AAATTACCCGCG'</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">IUPAC</span><span class="o">.</span><span class="n">IUPACUnambiguousDNA</span><span class="p">()),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'nucl2'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">nucl3</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">Seq</span><span class="p">(</span><span class="s">'ATATTACCCGGG'</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">IUPAC</span><span class="o">.</span><span class="n">IUPACUnambiguousDNA</span><span class="p">()),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'nucl3'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">prot1</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">nucl1</span><span class="o">.</span><span class="n">seq</span><span class="o">.</span><span class="n">translate</span><span class="p">(),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'prot1'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">prot2</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">nucl2</span><span class="o">.</span><span class="n">seq</span><span class="o">.</span><span class="n">translate</span><span class="p">(),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'prot2'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">prot3</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">nucl3</span><span class="o">.</span><span class="n">seq</span><span class="o">.</span><span class="n">translate</span><span class="p">(),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'prot3'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">aln</span> <span class="o">=</span> <span class="n">MultipleSeqAlignment</span><span class="p">([</span><span class="n">prot1</span><span class="p">,</span> <span class="n">prot2</span><span class="p">,</span> <span class="n">prot3</span><span class="p">])</span>
<span class="o">>>></span> <span class="n">codon_aln</span> <span class="o">=</span> <span class="n">CodonAlign</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">aln</span><span class="p">,</span> <span class="p">[</span><span class="n">nucl1</span><span class="p">,</span> <span class="n">nucl2</span><span class="p">,</span> <span class="n">nucl3</span><span class="p">])</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">codon_aln</span>
<span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Standard</span><span class="p">)</span> <span class="n">CodonAlignment</span> <span class="k">with</span> <span class="mi">3</span> <span class="n">rows</span> <span class="ow">and</span> <span class="mi">12</span> <span class="n">columns</span> <span class="p">(</span><span class="mi">4</span> <span class="n">codons</span><span class="p">)</span>
<span class="n">AAATTTCCCGGG</span> <span class="n">nucl1</span>
<span class="n">AAATTACCCGCG</span> <span class="n">nucl2</span>
<span class="n">ATATTACCCGGG</span> <span class="n">nucl3</span>
</pre>
<p>In the above example, <tt class="docutils literal">CodonAlign.build</tt> will try to match <tt class="docutils literal">nucl1</tt> with
<tt class="docutils literal">prot1</tt>, <tt class="docutils literal">nucl2</tt> with <tt class="docutils literal">prot2</tt> and <tt class="docutils literal">nucl3</tt> with <tt class="docutils literal">prot3</tt>, i.e.,
assuming the order of records in <tt class="docutils literal">aln</tt> and <tt class="docutils literal">[nucl1, nucl2, nucl3]</tt> is the
same.</p>
<p><tt class="docutils literal">CodonAlign.build</tt> method is also able to handle key match. In this case,
records with same id are paired. For example:</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="n">nucl1</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">Seq</span><span class="p">(</span><span class="s">'AAATTTCCCGGG'</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">IUPAC</span><span class="o">.</span><span class="n">IUPACUnambiguousDNA</span><span class="p">()),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'nucl1'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">nucl2</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">Seq</span><span class="p">(</span><span class="s">'AAATTACCCGCG'</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">IUPAC</span><span class="o">.</span><span class="n">IUPACUnambiguousDNA</span><span class="p">()),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'nucl2'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">nucl3</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">Seq</span><span class="p">(</span><span class="s">'ATATTACCCGGG'</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">IUPAC</span><span class="o">.</span><span class="n">IUPACUnambiguousDNA</span><span class="p">()),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'nucl3'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">prot1</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">nucl1</span><span class="o">.</span><span class="n">seq</span><span class="o">.</span><span class="n">translate</span><span class="p">(),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'prot1'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">prot2</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">nucl2</span><span class="o">.</span><span class="n">seq</span><span class="o">.</span><span class="n">translate</span><span class="p">(),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'prot2'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">prot3</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">nucl3</span><span class="o">.</span><span class="n">seq</span><span class="o">.</span><span class="n">translate</span><span class="p">(),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'prot3'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">aln</span> <span class="o">=</span> <span class="n">MultipleSeqAlignment</span><span class="p">([</span><span class="n">prot1</span><span class="p">,</span> <span class="n">prot2</span><span class="p">,</span> <span class="n">prot3</span><span class="p">])</span>
<span class="o">>>></span> <span class="n">nucl</span> <span class="o">=</span> <span class="p">{</span><span class="s">'prot1'</span><span class="p">:</span> <span class="n">nucl1</span><span class="p">,</span> <span class="s">'prot2'</span><span class="p">:</span> <span class="n">nucl2</span><span class="p">,</span> <span class="s">'prot3'</span><span class="p">:</span> <span class="n">nucl3</span><span class="p">}</span>
<span class="o">>>></span> <span class="n">codon_aln</span> <span class="o">=</span> <span class="n">CodonAlign</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">aln</span><span class="p">,</span> <span class="n">nucl</span><span class="p">)</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">codon_aln</span>
<span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Standard</span><span class="p">)</span> <span class="n">CodonAlignment</span> <span class="k">with</span> <span class="mi">3</span> <span class="n">rows</span> <span class="ow">and</span> <span class="mi">12</span> <span class="n">columns</span> <span class="p">(</span><span class="mi">4</span> <span class="n">codons</span><span class="p">)</span>
<span class="n">AAATTTCCCGGG</span> <span class="n">nucl1</span>
<span class="n">AAATTACCCGCG</span> <span class="n">nucl2</span>
<span class="n">ATATTACCCGGG</span> <span class="n">nucl3</span>
</pre>
<p>This option is handleful if you read nucleotide sequences using <tt class="docutils literal">SeqIO.index</tt>
method, in which case the nucleotide dict with be generated automatically.</p>
<p>Sometimes, you are neither not able to ensure the same order or the same id.
<tt class="docutils literal">CodonAlign.build</tt> method provides you an manual approach to tell the
program nucleotide sequence and protein sequence correspondance by generating
a <tt class="docutils literal">corr_dict</tt>. <tt class="docutils literal">corr_dict</tt> should be a dictionary that uses protein record
id as key and nucleotide record id as item. Let's look at an example:</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="n">nucl1</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">Seq</span><span class="p">(</span><span class="s">'AAATTTCCCGGG'</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">IUPAC</span><span class="o">.</span><span class="n">IUPACUnambiguousDNA</span><span class="p">()),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'nucl1'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">nucl2</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">Seq</span><span class="p">(</span><span class="s">'AAATTACCCGCG'</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">IUPAC</span><span class="o">.</span><span class="n">IUPACUnambiguousDNA</span><span class="p">()),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'nucl2'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">nucl3</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">Seq</span><span class="p">(</span><span class="s">'ATATTACCCGGG'</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">IUPAC</span><span class="o">.</span><span class="n">IUPACUnambiguousDNA</span><span class="p">()),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'nucl3'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">prot1</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">nucl1</span><span class="o">.</span><span class="n">seq</span><span class="o">.</span><span class="n">translate</span><span class="p">(),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'prot1'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">prot2</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">nucl2</span><span class="o">.</span><span class="n">seq</span><span class="o">.</span><span class="n">translate</span><span class="p">(),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'prot2'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">prot3</span> <span class="o">=</span> <span class="n">SeqRecord</span><span class="p">(</span><span class="n">nucl3</span><span class="o">.</span><span class="n">seq</span><span class="o">.</span><span class="n">translate</span><span class="p">(),</span> <span class="nb">id</span><span class="o">=</span><span class="s">'prot3'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">aln</span> <span class="o">=</span> <span class="n">MultipleSeqAlignment</span><span class="p">([</span><span class="n">prot1</span><span class="p">,</span> <span class="n">prot2</span><span class="p">,</span> <span class="n">prot3</span><span class="p">])</span>
<span class="o">>>></span> <span class="n">corr_dict</span> <span class="o">=</span> <span class="p">{</span><span class="s">'prot1'</span><span class="p">:</span> <span class="s">'nucl1'</span><span class="p">,</span> <span class="s">'prot2'</span><span class="p">:</span> <span class="s">'nucl2'</span><span class="p">,</span> <span class="s">'prot3'</span><span class="p">:</span> <span class="s">'nucl3'</span><span class="p">}</span>
<span class="o">>>></span> <span class="n">codon_aln</span> <span class="o">=</span> <span class="n">CodonAlign</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">aln</span><span class="p">,</span> <span class="p">[</span><span class="n">nucl3</span><span class="p">,</span> <span class="n">nucl1</span><span class="p">,</span> <span class="n">nucl2</span><span class="p">],</span> <span class="n">corr_dict</span><span class="o">=</span><span class="n">corr_dict</span><span class="p">)</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">codon_aln</span>
<span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Standard</span><span class="p">)</span> <span class="n">CodonAlignment</span> <span class="k">with</span> <span class="mi">3</span> <span class="n">rows</span> <span class="ow">and</span> <span class="mi">12</span> <span class="n">columns</span> <span class="p">(</span><span class="mi">4</span> <span class="n">codons</span><span class="p">)</span>
<span class="n">AAATTTCCCGGG</span> <span class="n">nucl1</span>
<span class="n">AAATTACCCGCG</span> <span class="n">nucl2</span>
<span class="n">ATATTACCCGGG</span> <span class="n">nucl3</span>
</pre>
<p>We can see, even though the second argument of <tt class="docutils literal">CodonAlign.build</tt> is not in
the same order with <tt class="docutils literal">aln</tt> in the above example, the <tt class="docutils literal">corr_dict</tt> tells the
program to pair protein records and nucleotide records. And we are still able
to obtain the correct <tt class="docutils literal">CodonAlignment</tt> object.</p>
<p>The underlying algorithm of <tt class="docutils literal">CodonAlign.build</tt> method is very similar to
<tt class="docutils literal">pal2nal</tt> (a very famous perl script to build codon alignment).
<tt class="docutils literal">CodonAlign.build</tt> will first translate protein sequences into a long
degenerate regular expression and tries to find a match in its corresponding
nucleotide sequence. When translation fails, it divide protein sequence into
several small anchors and tries to match each anchor to the nucleotide sequence
to figure out where the mismatch and frameshift events lie. Other options
available for <tt class="docutils literal">CodonAlign.build</tt> includes <tt class="docutils literal">anchor_len</tt> (default 10) and
<tt class="docutils literal">max_score</tt> (maximum tolerance of unexpected events, default 10). You may
want to refer the Biopython build-in help to get more information about these
options.</p>
<p>Now let's look at a real example of building codon alignment. Here we will use
epidermal growth factor (EGFR) gene to demonstrate how to obtain codon
alignment. To reduce your effort, we have already collected EGFR sequences for
<cite>Homo sapiens</cite>, <cite>Bos taurus</cite>, <cite>Rattus norvegicus</cite>, <cite>Sus scrofa</cite> and
<cite>Drosophila melanogaster</cite>. You can download them from here (
<a class="reference external" href="http://zruanweb.com/egfr.zip">egfr.zip</a>).
Uncomressing the <tt class="docutils literal">.zip</tt>, you will see three files. <tt class="docutils literal">egfr_nucl.fa</tt> is
nucleotide sequences of EGFR and <tt class="docutils literal">egfr_pro.aln</tt> is EGFR protein sequence
alignment in <tt class="docutils literal">clustal</tt> format. The <tt class="docutils literal">egfr_id</tt> contains id correspondance
between protein records and nucleotide records. You can then try the following
code (make sure the files are in your current python working directory):</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio</span> <span class="kn">import</span> <span class="n">SeqIO</span><span class="p">,</span> <span class="n">AlignIO</span>
<span class="o">>>></span> <span class="n">nucl</span> <span class="o">=</span> <span class="n">SeqIO</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="s">'egfr_nucl.fa'</span><span class="p">,</span> <span class="s">'fasta'</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">IUPAC</span><span class="o">.</span><span class="n">IUPACUnambiguousDNA</span><span class="p">())</span>
<span class="o">>>></span> <span class="n">prot</span> <span class="o">=</span> <span class="n">AlignIO</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="s">'egfr_pro.aln'</span><span class="p">,</span> <span class="s">'clustal'</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">IUPAC</span><span class="o">.</span><span class="n">protein</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">id_corr</span> <span class="o">=</span> <span class="p">{</span><span class="n">i</span><span class="o">.</span><span class="n">split</span><span class="p">()[</span><span class="mi">0</span><span class="p">]:</span> <span class="n">i</span><span class="o">.</span><span class="n">split</span><span class="p">()[</span><span class="mi">1</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">open</span><span class="p">(</span><span class="s">'egfr_id'</span><span class="p">)</span><span class="o">.</span><span class="n">readlines</span><span class="p">()}</span>
<span class="o">>>></span> <span class="n">aln</span> <span class="o">=</span> <span class="n">CodonAlign</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">prot</span><span class="p">,</span> <span class="n">nucl</span><span class="p">,</span> <span class="n">corr_dict</span><span class="o">=</span><span class="n">id_corr</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">CodonAlign</span><span class="o">.</span><span class="n">default_codon_alphabet</span><span class="p">)</span>
<span class="o">/</span><span class="n">biopython</span><span class="o">/</span><span class="n">Bio</span><span class="o">/</span><span class="n">CodonAlign</span><span class="o">/</span><span class="n">__init__</span><span class="o">.</span><span class="n">py</span><span class="p">:</span><span class="mi">568</span><span class="p">:</span> <span class="ne">UserWarning</span><span class="p">:</span> <span class="n">gi</span><span class="o">|</span><span class="mi">47522840</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NP_999172</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span><span class="p">(</span><span class="n">L</span> <span class="mi">449</span><span class="p">)</span> <span class="n">does</span> <span class="ow">not</span> <span class="n">correspond</span> <span class="n">to</span> <span class="n">gi</span><span class="o">|</span><span class="mi">47522839</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_214007</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span><span class="p">(</span><span class="n">ATG</span><span class="p">)</span>
<span class="o">%</span> <span class="p">(</span><span class="n">pro</span><span class="o">.</span><span class="n">id</span><span class="p">,</span> <span class="n">aa</span><span class="p">,</span> <span class="n">aa_num</span><span class="p">,</span> <span class="n">nucl</span><span class="o">.</span><span class="n">id</span><span class="p">,</span> <span class="n">this_codon</span><span class="p">))</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">aln</span>
<span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Standard</span><span class="p">)</span> <span class="n">CodonAlignment</span> <span class="k">with</span> <span class="mi">6</span> <span class="n">rows</span> <span class="ow">and</span> <span class="mi">4446</span> <span class="n">columns</span> <span class="p">(</span><span class="mi">1482</span> <span class="n">codons</span><span class="p">)</span>
<span class="n">ATGATGATTATCAGCATGTGGATGAGCATATCGCGAGGATTGTGGGACAGCAGCTCC</span><span class="o">...</span><span class="n">GTG</span> <span class="n">gi</span><span class="o">|</span><span class="mi">24657088</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_057410</span><span class="o">.</span><span class="mi">3</span><span class="o">|</span>
<span class="o">---------------------</span><span class="n">ATGCTGCTGCGACGGCGCAACGGCCCCTGCCCCTTC</span><span class="o">...</span><span class="n">GTG</span> <span class="n">gi</span><span class="o">|</span><span class="mi">24657104</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_057411</span><span class="o">.</span><span class="mi">3</span><span class="o">|</span>
<span class="o">------------------------------</span><span class="n">ATGAAAAAGCACGAG</span><span class="o">------------...</span><span class="n">GCC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">302179500</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">HM749883</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="o">------------------------------</span><span class="n">ATGCGACGCTCCTGGGCGGGCGGCGCC</span><span class="o">...</span><span class="n">GCA</span> <span class="n">gi</span><span class="o">|</span><span class="mi">47522839</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_214007</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="o">------------------------------</span><span class="n">ATGCGACCCTCCGGGACGGCCGGGGCA</span><span class="o">...</span><span class="n">GCA</span> <span class="n">gi</span><span class="o">|</span><span class="mi">41327737</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_005228</span><span class="o">.</span><span class="mi">3</span><span class="o">|</span>
<span class="o">------------------------------</span><span class="n">ATGCGACCCTCAGGGACTGCGAGAACC</span><span class="o">...</span><span class="n">GCA</span> <span class="n">gi</span><span class="o">|</span><span class="mi">6478867</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M37394</span><span class="o">.</span><span class="mi">2</span><span class="o">|</span><span class="n">RATEGFR</span>
</pre>
<p>We can see, while building the codon alignment a mismatch event is found. And
this is shown as a UserWarning.</p>
</div>
<div class="section" id="x-4-codon-alignment-application">
<h4>X.4 Codon Alignment Application</h4>
<p>The most important application of codon alignment is to estimate
nonsynonymous substitutions per site (dN) and synonymous substitutions per
site (dS). <tt class="docutils literal">CodonAlign</tt> currently support three counting based methods
(NG86, LWL85, YN00) and maximum likelihood method to estimate dN and dS.
The function to conduct dN, dS estimation is called <tt class="docutils literal">cal_dn_ds</tt>. When you
obtained a codon alignment, it is quite easy to calculate dN and dS. For
example (assuming you have EGFR codon alignmnet in the python working
space):</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.CodonAlign.CodonSeq</span> <span class="kn">import</span> <span class="n">cal_dn_ds</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">aln</span>
<span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Standard</span><span class="p">)</span> <span class="n">CodonAlignment</span> <span class="k">with</span> <span class="mi">6</span> <span class="n">rows</span> <span class="ow">and</span> <span class="mi">4446</span> <span class="n">columns</span> <span class="p">(</span><span class="mi">1482</span> <span class="n">codons</span><span class="p">)</span>
<span class="n">ATGATGATTATCAGCATGTGGATGAGCATATCGCGAGGATTGTGGGACAGCAGCTCC</span><span class="o">...</span><span class="n">GTG</span> <span class="n">gi</span><span class="o">|</span><span class="mi">24657088</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_057410</span><span class="o">.</span><span class="mi">3</span><span class="o">|</span>
<span class="o">---------------------</span><span class="n">ATGCTGCTGCGACGGCGCAACGGCCCCTGCCCCTTC</span><span class="o">...</span><span class="n">GTG</span> <span class="n">gi</span><span class="o">|</span><span class="mi">24657104</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_057411</span><span class="o">.</span><span class="mi">3</span><span class="o">|</span>
<span class="o">------------------------------</span><span class="n">ATGAAAAAGCACGAG</span><span class="o">------------...</span><span class="n">GCC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">302179500</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">HM749883</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="o">------------------------------</span><span class="n">ATGCGACGCTCCTGGGCGGGCGGCGCC</span><span class="o">...</span><span class="n">GCA</span> <span class="n">gi</span><span class="o">|</span><span class="mi">47522839</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_214007</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="o">------------------------------</span><span class="n">ATGCGACCCTCCGGGACGGCCGGGGCA</span><span class="o">...</span><span class="n">GCA</span> <span class="n">gi</span><span class="o">|</span><span class="mi">41327737</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_005228</span><span class="o">.</span><span class="mi">3</span><span class="o">|</span>
<span class="o">------------------------------</span><span class="n">ATGCGACCCTCAGGGACTGCGAGAACC</span><span class="o">...</span><span class="n">GCA</span> <span class="n">gi</span><span class="o">|</span><span class="mi">6478867</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M37394</span><span class="o">.</span><span class="mi">2</span><span class="o">|</span><span class="n">RATEGFR</span>
<span class="o">>>></span> <span class="n">dN</span><span class="p">,</span> <span class="n">dS</span> <span class="o">=</span> <span class="n">cal_dn_ds</span><span class="p">(</span><span class="n">aln</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">aln</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">method</span><span class="o">=</span><span class="s">'NG86'</span><span class="p">)</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">dN</span><span class="p">,</span> <span class="n">dS</span>
<span class="mf">0.0209078305058</span> <span class="mf">0.0178371876389</span>
<span class="o">>>></span> <span class="n">dN</span><span class="p">,</span> <span class="n">dS</span> <span class="o">=</span> <span class="n">cal_dn_ds</span><span class="p">(</span><span class="n">aln</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">aln</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">method</span><span class="o">=</span><span class="s">'LWL95'</span><span class="p">)</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">dN</span><span class="p">,</span> <span class="n">dS</span>
<span class="mf">0.0203061425453</span> <span class="mf">0.0163935691992</span>
<span class="o">>>></span> <span class="n">dN</span><span class="p">,</span> <span class="n">dS</span> <span class="o">=</span> <span class="n">cal_dn_ds</span><span class="p">(</span><span class="n">aln</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">aln</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">method</span><span class="o">=</span><span class="s">'YN00'</span><span class="p">)</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">dN</span><span class="p">,</span> <span class="n">dS</span>
<span class="mf">0.0198195580321</span> <span class="mf">0.0221560648799</span>
<span class="o">>>></span> <span class="n">dN</span><span class="p">,</span> <span class="n">dS</span> <span class="o">=</span> <span class="n">cal_dn_ds</span><span class="p">(</span><span class="n">aln</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">aln</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">method</span><span class="o">=</span><span class="s">'ML'</span><span class="p">)</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">dN</span><span class="p">,</span> <span class="n">dS</span>
<span class="mf">0.0193877676103</span> <span class="mf">0.0217247139962</span>
</pre>
<p>If you are using maximum likelihood methdo to estimate dN and dS, you are
also able to specify equilibrium codon frequency to <tt class="docutils literal">cfreq</tt> argument.
Available options include <tt class="docutils literal">F1x4</tt>, <tt class="docutils literal">F3x4</tt> and <tt class="docutils literal">F61</tt>.</p>
<p>It is also possible to get dN and dS matrix or a tree from a <tt class="docutils literal">CodonAlignment</tt> object.</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="n">dn_matrix</span><span class="p">,</span> <span class="n">ds_matrix</span> <span class="o">=</span> <span class="n">aln</span><span class="o">.</span><span class="n">get_dn_ds_matrxi</span><span class="p">()</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">dn_matrix</span>
<span class="n">gi</span><span class="o">|</span><span class="mi">24657088</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_057410</span><span class="o">.</span><span class="mi">3</span><span class="o">|</span> <span class="mi">0</span>
<span class="n">gi</span><span class="o">|</span><span class="mi">24657104</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_057411</span><span class="o">.</span><span class="mi">3</span><span class="o">|</span> <span class="mf">0.0209078305058</span> <span class="mi">0</span>
<span class="n">gi</span><span class="o">|</span><span class="mi">302179500</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">HM749883</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span> <span class="mf">0.611523924924</span> <span class="mf">0.61022032668</span> <span class="mi">0</span>
<span class="n">gi</span><span class="o">|</span><span class="mi">47522839</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_214007</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span> <span class="mf">0.614035083563</span> <span class="mf">0.60401686212</span> <span class="mf">0.0411803504059</span> <span class="mi">0</span>
<span class="n">gi</span><span class="o">|</span><span class="mi">41327737</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_005228</span><span class="o">.</span><span class="mi">3</span><span class="o">|</span> <span class="mf">0.61415325314</span> <span class="mf">0.60182631356</span> <span class="mf">0.0670105144563</span> <span class="mf">0.0614703609541</span> <span class="mi">0</span>
<span class="n">gi</span><span class="o">|</span><span class="mi">6478867</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M37394</span><span class="o">.</span><span class="mi">2</span><span class="o">|</span><span class="n">RATEGFR</span> <span class="mf">0.61870883409</span> <span class="mf">0.606868724887</span> <span class="mf">0.0738690303483</span> <span class="mf">0.0735789092792</span> <span class="mf">0.0517984707257</span> <span class="mi">0</span>
<span class="n">gi</span><span class="o">|</span><span class="mi">24657088</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_057410</span><span class="o">.</span><span class="mi">3</span><span class="o">|</span> <span class="n">gi</span><span class="o">|</span><span class="mi">24657104</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_057411</span><span class="o">.</span><span class="mi">3</span><span class="o">|</span> <span class="n">gi</span><span class="o">|</span><span class="mi">302179500</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">HM749883</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span> <span class="n">gi</span><span class="o">|</span><span class="mi">47522839</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_214007</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span> <span class="n">gi</span><span class="o">|</span><span class="mi">41327737</span><span class="o">|</span><span class="n">ref</span><span class="o">|</span><span class="n">NM_005228</span><span class="o">.</span><span class="mi">3</span><span class="o">|</span> <span class="n">gi</span><span class="o">|</span><span class="mi">6478867</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M37394</span><span class="o">.</span><span class="mi">2</span><span class="o">|</span><span class="n">RATEGFR</span>
<span class="o">>>></span> <span class="n">dn_tree</span><span class="p">,</span> <span class="n">ds_tree</span> <span class="o">=</span> <span class="n">aln</span><span class="o">.</span><span class="n">get_dn_ds_tree</span><span class="p">()</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">dn_tree</span>
<span class="n">Tree</span><span class="p">(</span><span class="n">rooted</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">Clade</span><span class="p">(</span><span class="n">branch_length</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'Inner5'</span><span class="p">)</span>
<span class="n">Clade</span><span class="p">(</span><span class="n">branch_length</span><span class="o">=</span><span class="mf">0.279185347322</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'Inner4'</span><span class="p">)</span>
<span class="n">Clade</span><span class="p">(</span><span class="n">branch_length</span><span class="o">=</span><span class="mf">0.00859186651689</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'Inner3'</span><span class="p">)</span>
<span class="n">Clade</span><span class="p">(</span><span class="n">branch_length</span><span class="o">=</span><span class="mf">0.0258992353629</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'gi|6478867|gb|M37394.2|RATEGFR'</span><span class="p">)</span>
<span class="n">Clade</span><span class="p">(</span><span class="n">branch_length</span><span class="o">=</span><span class="mf">0.0258992353629</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'gi|41327737|ref|NM_005228.3|'</span><span class="p">)</span>
<span class="n">Clade</span><span class="p">(</span><span class="n">branch_length</span><span class="o">=</span><span class="mf">0.0139009266768</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'Inner2'</span><span class="p">)</span>
<span class="n">Clade</span><span class="p">(</span><span class="n">branch_length</span><span class="o">=</span><span class="mf">0.020590175203</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'gi|47522839|ref|NM_214007.1|'</span><span class="p">)</span>
<span class="n">Clade</span><span class="p">(</span><span class="n">branch_length</span><span class="o">=</span><span class="mf">0.020590175203</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'gi|302179500|gb|HM749883.1|'</span><span class="p">)</span>
<span class="n">Clade</span><span class="p">(</span><span class="n">branch_length</span><span class="o">=</span><span class="mf">0.294630667432</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'Inner1'</span><span class="p">)</span>
<span class="n">Clade</span><span class="p">(</span><span class="n">branch_length</span><span class="o">=</span><span class="mf">0.0104539152529</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'gi|24657104|ref|NM_057411.3|'</span><span class="p">)</span>
<span class="n">Clade</span><span class="p">(</span><span class="n">branch_length</span><span class="o">=</span><span class="mf">0.0104539152529</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'gi|24657088|ref|NM_057410.3|'</span><span class="p">)</span>
</pre>
<p>Another application of codon alignment that <tt class="docutils literal">CodonAlign</tt> supports is
Mcdonald-Kreitman test. This test compares the within species synonymous
substitutions and nonsynonymous substitutions and between species synonymous
substitutions and nonsynonymous substitutions to see if they are from the same
evolutionary process. The test requires gene sequences sampled from different
individuals of the same species. In the following example, we will use <cite>Adh</cite>
gene from fluit fly to demonstrate how to conduct the test. The data includes
11 individuals from <cite>D. melanogaster</cite>, 4 individuals from <cite>D. simulans</cite> and
12 individuals from <cite>D. yakuba</cite>. The data is available from here
<a class="reference external" href="http://zruanweb.com/adh.zip">adh.zip</a>. A function called <tt class="docutils literal">mktest</tt> will be
used for the test.</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio</span> <span class="kn">import</span> <span class="n">SeqIO</span><span class="p">,</span> <span class="n">AlignIO</span>
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.Alphabet</span> <span class="kn">import</span> <span class="n">IUPAC</span>
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.CodonAlign</span> <span class="kn">import</span> <span class="n">build</span>
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">Bio.CodonAlign.CodonAlignment</span> <span class="kn">import</span> <span class="n">mktest</span>
<span class="o">>>></span> <span class="n">pro_aln</span> <span class="o">=</span> <span class="n">AlignIO</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="s">'adh.aln'</span><span class="p">,</span> <span class="s">'clustal'</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">IUPAC</span><span class="o">.</span><span class="n">protein</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">p</span> <span class="o">=</span> <span class="n">SeqIO</span><span class="o">.</span><span class="n">index</span><span class="p">(</span><span class="s">'drosophilla.fasta'</span><span class="p">,</span> <span class="s">'fasta'</span><span class="p">,</span> <span class="n">alphabet</span><span class="o">=</span><span class="n">IUPAC</span><span class="o">.</span><span class="n">IUPACUnambiguousDNA</span><span class="p">())</span>
<span class="o">>>></span> <span class="n">codon_aln</span> <span class="o">=</span> <span class="n">build</span><span class="p">(</span><span class="n">pro_aln</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">codon_aln</span>
<span class="n">CodonAlphabet</span><span class="p">(</span><span class="n">Standard</span><span class="p">)</span> <span class="n">CodonAlignment</span> <span class="k">with</span> <span class="mi">27</span> <span class="n">rows</span> <span class="ow">and</span> <span class="mi">768</span> <span class="n">columns</span> <span class="p">(</span><span class="mi">256</span> <span class="n">codons</span><span class="p">)</span>
<span class="n">ATGGCGTTTACCTTGACCAACAAGAACGTGGTTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9217</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57365</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACCTTGACCAACAAGAACGTGGTTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9219</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57366</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACCTTGACCAACAAGAACGTGGTTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9221</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57367</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACCTTGACCAACAAGAACGTGGTTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9223</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57368</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACCTTGACCAACAAGAACGTGGTTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9225</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57369</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACCTTGACCAACAAGAACGTGGTTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9227</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57370</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACCTTGACCAACAAGAACGTGGTTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9229</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57371</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACCTTGACCAACAAGAACGTGGTTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9231</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57372</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACCTTGACCAACAAGAACGTGGTTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9233</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57373</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACCTTGACCAACAAGAACGTGGTTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9235</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57374</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACCTTGACCAACAAGAACGTGGTTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9237</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57375</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACCTTGACCAACAAGAACGTGGTTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9239</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57376</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACTTTGACCAACAAGAACGTGATTTTCGTTGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9097</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57361</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACTTTGACCAACAAGAACGTGATTTTCGTTGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9099</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57362</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACTTTGACCAACAAGAACGTGATTTTCGTTGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9101</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57363</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGGCGTTTACTTTGACCAACAAGAACGTGATTTTCGTTGCCGGTCTGGGAGGCATC</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">9103</span><span class="o">|</span><span class="n">emb</span><span class="o">|</span><span class="n">X57364</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span>
<span class="n">ATGTCGTTTACTTTGACCAACAAGAACGTGATTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">156879</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M17837</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span><span class="n">DROADHCK</span>
<span class="n">ATGTCGTTTACTTTGACCAACAAGAACGTGATTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">156877</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M17836</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span><span class="n">DROADHCJ</span>
<span class="n">ATGTCGTTTACTTTGACCAACAAGAACGTGATTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">156875</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M17835</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span><span class="n">DROADHCI</span>
<span class="n">ATGTCGTTTACTTTGACCAACAAGAACGTGATTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">156873</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M17834</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span><span class="n">DROADHCH</span>
<span class="n">ATGTCGTTTACTTTGACCAACAAGAACGTGATTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">156871</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M17833</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span><span class="n">DROADHCG</span>
<span class="n">ATGTCGTTTACTTTGACCAACAAGAACGTGATTTTCGTTGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">156863</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M19547</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span><span class="n">DROADHCC</span>
<span class="n">ATGTCGTTTACTTTGACCAACAAGAACGTGATTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">156869</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M17832</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span><span class="n">DROADHCF</span>
<span class="n">ATGTCGTTTACTTTGACCAACAAGAACGTGATTTTCGTGGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">156867</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M17831</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span><span class="n">DROADHCE</span>
<span class="n">ATGTCGTTTACTTTGACCAACAAGAACGTGATTTTCGTTGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">156865</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M17830</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span><span class="n">DROADHCD</span>
<span class="n">ATGTCGTTTACTTTGACCAACAAGAACGTGATTTTCGTTGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">156861</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M17828</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span><span class="n">DROADHCB</span>
<span class="n">ATGTCGTTTACTTTGACCAACAAGAACGTGATTTTCGTTGCCGGTCTGGGAGGCATT</span><span class="o">...</span><span class="n">ATC</span> <span class="n">gi</span><span class="o">|</span><span class="mi">156859</span><span class="o">|</span><span class="n">gb</span><span class="o">|</span><span class="n">M17827</span><span class="o">.</span><span class="mi">1</span><span class="o">|</span><span class="n">DROADHCA</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">mktest</span><span class="p">([</span><span class="n">codon_aln</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="mi">12</span><span class="p">],</span> <span class="n">codon_aln</span><span class="p">[</span><span class="mi">12</span><span class="p">:</span><span class="mi">16</span><span class="p">],</span> <span class="n">codon_aln</span><span class="p">[</span><span class="mi">16</span><span class="p">:]])</span>
<span class="mf">0.00206457257254</span>
</pre>
<p>In the above example, <tt class="docutils literal">codon_aln[1:12]</tt> belongs to <cite>D. melanogaster</cite>,
<tt class="docutils literal">codon_aln[12:16]</tt> belongs to <cite>D. simulans</cite> and <tt class="docutils literal">codon_aln[16:]</tt> belongs
to <cite>D. yakuba</cite>. <tt class="docutils literal">mktest</tt> will return the p-value of the test. We can see
in this case, 0.00206 << 0.01, therefore, the gene is under strong negative
selection according to MK test.</p>
</div>
<div class="section" id="x-4-future-development">
<h4>X.4 Future Development</h4>
<p>Because of the limited time frame for Google Summer of Code project, some of
the functions in <tt class="docutils literal">CodonAlign</tt> is not tested comprehensively. In the
following days, I will continue perfect the code and several new features
will be added. I am always welcome to hear your suggestions and feature
request. You are also highly encouraged to contribute to the existing code.
Please do not hesitable to email me (zruan1991 at gmail dot com) when you have
novel ideas that can make the code better.</p>
</div>
</div>
</div>
<div class="section" id="acknowledgement">
<h2>Acknowledgement</h2>
<p>I acknowledge all the NESCent Phyloinformatics group and Biopythoners who are
interested in my project and provide feedbacks in the past two months. Special
thanks are given to Eric Talevich and Peter Cock, for their helps and valuable
suggestions while I'm coding in GSoC. I will also thank my academic advisor --
Natarajan Kannan to respect my personal interest and allow me to code for Open
Source Software.</p>
<p>For more info, please visit my
[CodonAlign github repo](<a class="reference external" href="https://github.com/zruan/biopython/tree/master/Bio/CodonAlign">https://github.com/zruan/biopython/tree/master/Bio/CodonAlign</a>)</p>
</div>
</div><!-- /.entry-content -->
<div class="comments">
<h2>Comments !</h2>
<div id="disqus_thread"></div>
<script type="text/javascript">
var disqus_identifier = "11th-diary.html";
(function() {
var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
dsq.src = 'http://zruansblog.disqus.com/embed.js';
(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
})();
</script>
</div>
</article>
</section>
<section id="extras" class="body">
<div class="blogroll">
<h2>blogroll</h2>
<ul>
<li><a href="http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013#Codon_Alignment_and_Analysis_in_Biopython">NESCent GSoC Project wiki</a></li>
<li><a href="http://biopython.org/wiki/Main_Page">Biopython</a></li>
<li><a href="http://iob.uga.edu/">UGA Bioinformatics</a></li>
<li><a href="http://getpelican.com/">Pelican</a></li>
<li><a href="http://python.org/">Python.org</a></li>
<li><a href="http://jinja.pocoo.org/">Jinja2</a></li>
</ul>
</div><!-- /.blogroll -->
<div class="social">
<h2>social</h2>
<ul>
<li><a href="https://www.facebook.com/ruan.zheng.7">Facebook</a></li>
<li><a href="http://zr1991.blogspot.com/">Google Blogger</a></li>
<li><a href="http://user.qzone.qq.com/8390905">Qzone</a></li>
</ul>
</div><!-- /.social -->
</section><!-- /#extras -->
<footer id="contentinfo" class="body">
</footer><!-- /#contentinfo -->
<script type="text/javascript">
var disqus_shortname = 'zruansblog';
(function () {
var s = document.createElement('script'); s.async = true;
s.type = 'text/javascript';
s.src = 'http://' + disqus_shortname + '.disqus.com/count.js';
(document.getElementsByTagName('HEAD')[0] || document.getElementsByTagName('BODY')[0]).appendChild(s);
}());
</script>
</body>
</html>