-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathahmadinejad_10_evolution_799836.pdf.txt
1171 lines (916 loc) · 45.3 KB
/
ahmadinejad_10_evolution_799836.pdf.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Evolution of spliceosomal introns following endosymbiotic gene transfer</title>
<meta name="Subject" content="BMC Evolutionary Biology 2010 10:57. doi:10.1186/1471-2148-10-57"/>
<meta name="Author" content="Nahal Ahmadinejad"/>
<meta name="Creator" content="Arbortext Advanced Print Publisher 10.0.1082/W Unicode"/>
<meta name="Producer" content="Acrobat Distiller 9.0.0 (Windows)"/>
<meta name="CreationDate" content=""/>
</head>
<body>
<pre>
Ahmadinejad et al. BMC Evolutionary Biology 2010, 10:57
http://www.biomedcentral.com/1471-2148/10/57
RESEARCH ARTICLE
Open Access
Evolution of spliceosomal introns following
endosymbiotic gene transfer
Nahal Ahmadinejad1,2, Tal Dagan1, Nicole Gruenheit1, William Martin1, Toni Gabaldón3*
Abstract
Background: Spliceosomal introns are an ancient, widespread hallmark of eukaryotic genomes. Despite much
research, many questions regarding the origin and evolution of spliceosomal introns remain unsolved, partly due
to the difficulty of inferring ancestral gene structures. We circumvent this problem by using genes originated by
endosymbiotic gene transfer, in which an intron-less structure at the time of the transfer can be assumed.
Results: By comparing the exon-intron structures of 64 mitochondrial-derived genes that were transferred to the
nucleus at different evolutionary periods, we can trace the history of intron gains in different eukaryotic lineages.
Our results show that the intron density of genes transferred relatively recently to the nuclear genome is similar to
that of genes originated by more ancient transfers, indicating that gene structure can be rapidly shaped by intron
gain after the integration of the gene into the genome and that this process is mainly determined by forces acting
specifically on each lineage. We analyze 12 cases of mitochondrial-derived genes that have been transferred to the
nucleus independently in more than one lineage.
Conclusions: Remarkably, the proportion of shared intron positions that were gained independently in
homologous genes is similar to that proportion observed in genes that were transferred prior to the speciation
event and whose shared intron positions might be due to vertical inheritance. A particular case of parallel intron
gain in the nad7 gene is discussed in more detail.
Background
Many eukaryotic genes contain spliceosomal introns [1]:
segments of non-coding sequences that are excised from
the pre-mRNA by the spliceosome complex [2]. Spliceosomal introns have been found with huge varying rates
in all sequenced eukaryotes and are absent in all prokaryotic genomes sequenced to date [3]. These findings
have been discussed in the context of two alternative
hypotheses. The introns-early hypothesis states that spliceosomal introns were present in the last common
ancestor of prokaryotes and eukaryotes but were subsequently lost in all prokaryotes [4]. In contrast, the
introns-late hypothesis links the origin of spliceosomal
introns to the emergence of eukaryotes. In accordance
to the introns-late hypothesis, spliceosomal introns were
supposed to originate from self-splicing group II introns
during the evolution of eukaryotes [5]. This model is
supported by similarities between group II introns and
* Correspondence: [email protected]
3
Bioinformatics and Genomics Programme, Centre for Genomic Regulation
(CRG), Dr Aiguader, 88 Barcelona 08003, Spain
the catalytic snRNA components of the spliceosome,
suggesting that they might have had a common ancestor
[6,7]. The fact that group II introns are found in bacterial and mitochondrial genomes suggests a possible evolutionary connection between spliceosomal introns and
the development of mitochondria [8,9]. These cell organelles originated by endosymbiosis from an alpha-proteobacterial ancestor [10]. In the course of evolution,
their genomes were reduced through gene loss but also
to a large extent through the transfer of many genes to
their host genome [11,12]. These endosymbiotic gene
transfers could have spread group II introns into the
host genome, which, in turn, might have initiated the
evolution of spliceosomal introns and the spliceosome.
Additionally, these influences might have also resulted
in a selective force towards the evolution of a nucleus
which forms physical boundaries between the splicing
and translation processes [9].
The introns-early and introns-late hypotheses have
been discussed in the literature until recently [13,14]
with every new sequenced genome adding more
© 2010 Ahmadinejad et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Ahmadinejad et al. BMC Evolutionary Biology 2010, 10:57
http://www.biomedcentral.com/1471-2148/10/57
information to our understanding of intron evolutionary
dynamics throughout eukaryotic lineages. Nowadays
there is a larger consensus around the introns-late
hypothesis, although the mechanisms and dynamics of
intron gains and loss in eukaryotes are still a matter of
debate. Recently, many studies have focused on inferring
rates of intron gain and loss across the evolution of
eukaryotes. The results reveal large differences in intron
gain and loss rates in different lineages [15-21]. Other
studies have traced the evolution of introns across the
major eukaryotic lineages by using different evolutionary
models [22,23].
One of the difficulties of modeling intron-evolution is
that ancestral gene structures are generally unknown.
Therefore, models rely on certain parameters that are
used to infer ancestral states of intron presence or
absence. We circumvent this problem here by using a
set of nuclear genes that originated by endosymbiotic
gene transfer. These genes did not contain spliceosomal
introns when they were transferred to the host genome,
so that the introns found in these genes must all have
been gained after the integration of the gene. We exploit
the circumstance that nuclear encoded genes with mitochondrial origin can be identified by their sequence
similarity and phylogenetic proximity to their alpha-proteobacterial homologs [11,24]. In particular, we put our
focus on genes with a clear-cut proto-mitochondrial origin, as reported by phylogenetic analyses of mitochondrial ribosomal proteins [25] and protein complexes
from the oxidative phosphorylation pathway (OXPHOS)
[11,26,27]. Our results reveal a highly dynamic speciesspecific intron evolution, which is able to shape relatively rapidly the intron-exon structure of a transferred
gene. Hence, intron density, exon symmetry and intron
phase distribution of recently transferred genes is similar
to other genes in the genome. We find several instances
of independent parallel transfers of genes. Comparing
their ratio of shared intron positions to those of genes
that vertically derive from a single transfer event, our
results indicate that, for our set of genes, the proportion
of shared intron positions between genes that were
transferred independently on more than one occasion is
similar to those that were transferred in a single event.
Finally, we provide an in-depth analysis of clear-cut case
of an intron that was inserted at identical positions in
the nad7 gene which was transferred twice independently in the plant and animal lineages.
Results and discussion
Proto-mitochondrial derived genes are not different from
other genes in terms of their intron structure
We compiled a list of 64 nuclear-encoded human genes
of proto-mitochondrial origin [11,25-27]. These include
44 genes that encode for proteins of the mitochondrial
Page 2 of 11
ribosome and 20 genes of the oxidative phosphorylation
(OXPHOS) pathway (Additional file 1). The intron-exon
structure of these genes and their homologs across a
broad set of 18 eukaryotic organisms was determined by
comparing each protein sequence with the respective
genomic sequence (see Methods). The set of eukaryotic
genomes includes three plant/green alga genomes, five
fungi, six metazoans and four protists (Additional file 2).
The distribution of intron densities, intron phases, and
symmetric and asymmetric exons in proto-mitochondrial derived OXPHOS and ribosomal genes are shown
in Figure 1. Intron densities range from 0 to 6 introns/
kb of coding sequence and always show ranges that are
within the normal values of the species considered [14].
The same can be observed for other characteristics such
as the prevalence of phase 0 introns and symmetrical
exons. A bias of phase 0 introns is a frequent observation, which is often linked with the preference of newly
gained introns [28,29]. A ratio of 5:3:2 of phase 0, phase
1 and phase 2 introns as found in this study for the considered proto-mitochondrial genes is in accordance with
results reported for genes of different origins [30,31].
Finally, our finding that 0-0 exons account for the
majority of symmetrical exons is also in line with general observations in eukaryotic genomes [30]. Thus,
according to their intron densities, exon symmetries and
phase distributions, proto-mitochondrial derived
OXPHOS and ribosomal genes are undistinguishable
from other genes in eukaryotic genomes. In a similar
study with chloroplast-derived genes in plant genomes,
Basu et. al. [32], found significant, but only slightly
lower intron densities in those genes transferred from
the chloroplast than in ancestral eukaryotic genes. In
contrast, in a study by Roy et al. [33] little intron gain
was detected in genes acquired by lateral transfer from
prokaryotic donors.
Lack of correlation between time of endosymbiotic gene
transfer and intron density
Mapping the relative time of endosymbiotic gene transfer from the mitochondrion (see Methods) onto the
phylogenetic tree of eukaryotes [34], and considering a
parsimonious scenario, we can approximate the history
of endosymbiotic gene transfers to the nuclear genome,
and thereby establish a relative ordered timing of the
events (Figure 2). It must be noted, that a parsimonious
approach might be affected by incomplete taxonomic
sampling and errors in the species tree. To limit such
effects we used all available data on mitochondrial genomes available at NCBI database and left unresolved
those transfers that could not be placed with confidence
due to multifurcations in the tree of eukaryotes. This
approach served to establish a relative timing of endosymbiotic gene transfer events for some genes and
Ahmadinejad et al. BMC Evolutionary Biology 2010, 10:57
http://www.biomedcentral.com/1471-2148/10/57
Page 3 of 11
Figure 1 Intron densities and distributions of intron phases and exon symmetry are shown for all proto-mitochondrial genes of the
oxidative phosphorylation pathway and the ribosomal mitochondrial proteins and their homologs. The intron density is given as the
number of introns per 1 kb of coding sequence for each species for the groups animals (cel: Caenorhabditis elegans, dme: Drosophila
melanogaster, dre: Danio rerio, hsa: Homo sapiens, rno: Rattus norvegicus), fungi (afu: Aspergillus fumigatus, spo: Schizosaccharomyces pombe, sce:
Saccharomyces cerevisiae, yli: Yarrowia lipolytica, cgl: Candida glabrata), protists (ddi: Dictyostelium discoideum, tps: Thalassiosira pseudonana, lma:
Leishmania major, pfa: Plasmodium falciparum), and plants/green alga (cre: Chlamydomonas reinhardtii, osa: Oryza sativa, ath: Arabidopsis thaliana).
The average intron densities for the different species are indicated by horizontal lines, values were taken or computed from the literature
[38,57-59]. Intron phases are presented in percentages for all genes. The percentages of exon symmetry are shown separately for symmetric and
asymmetric exons, in which all possible symmetries are considered.
taxonomic groups that are in resolved parts of the tree
for which mitochondrial genomes are well sampled. In
particular, for genes transferred within the metazoan
lineage, which is densely sampled in terms of mitochondrial genomes, we could classify genes into relatively
more ancient and more recent transfers. In order to test
the variation of intron gain over time, we compared the
intron densities of early and late transfers in genomes of
four metazoan species: the vertebrates Homo sapiens
and Danio rerio, the insect Drosophila melanogaster and
the nematode Caenorhabditis elegans (Figure 3). Our
results show no correlation between intron density and
the time of the gene transfer. Instead, differences
between the densities of the corresponding genes in different species are generally larger than the differences
observed between genes transferred at different
Ahmadinejad et al. BMC Evolutionary Biology 2010, 10:57
http://www.biomedcentral.com/1471-2148/10/57
Page 4 of 11
Figure 2 The tree represents current view of phylogenetic relationships between the lineages sampled in this analysis, as summarized
by Roger and Simpson [60]. Inferred timing for the transfers of genes from the mitochondrion to the host nucleus is labeled at the branches.
The timing of each transfer depends on the presence or absence of each gene in the mitochondrial genome and the phylogeny. Protomitochondrial genes of a) the oxidative phosphorylation pathway, b) ribosomal mitochondrial proteins.
evolutionary stages. This indicates that intron densities
are governed by lineage-specific constraints and are
independent of the time of the transfer event. This is
consistent with previous findings. For instance, an
extensive lineage-specific loss of introns in an intronrich ancestor is suggested to happened in some chromalveolate lineages [35]. Our results suggest that intron
gain and not just reduced intron loss could be responsible for the current high densities found in plant and
animal genomes. In fact, intron gain is the only process
that can explain the current high intron densities in
recently transferred genes. Nevertheless, the existence of
an intron-rich ancestor of eukaryotes is strongly supported by a high rate of shared intron positions between
animals, fungi and plants [22,23,36].
Significant inter-kingdom conservation of plant-animal
intron positions
To assess the extent of shared intron positions we
aligned the protein sequences of transferred genes in
Ahmadinejad et al. BMC Evolutionary Biology 2010, 10:57
http://www.biomedcentral.com/1471-2148/10/57
Page 5 of 11
Figure 3 The intron density is shown for genes that were transferred at different time scales during evolution from the
mitochondrion to the nucleus in Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans and Danio rerio. a) proteins of the
oxidative phosphorylation pathway, b) ribosomal mitochondrial proteins. Although the most ancient class of transfers (nad8, nad10, rpl32, rpl19)
is unassigned in Figure 2 we consider them to be relatively more ancient than nad11, rps10 and rps3 because the latter are nuclear only in
unikonts and the former are nuclear in most eukaryotic groups.
Ahmadinejad et al. BMC Evolutionary Biology 2010, 10:57
http://www.biomedcentral.com/1471-2148/10/57
Page 6 of 11
Table 1 Number of species-specific and shared intron positions in proteins of the oxidative phosphorylation pathway.
Animals1
Plants2
Fungi3
Dictyostelium
discoideum
Leishmania
major
Plasmodium
falciparum
Thalassiosira
pseudonana
Animals1
287
60 (8.62)
12 (1.72)
4 (0.58)
-
-
1 (0.14)
Plants2
Fungi3
4 (0.58)
1 (0.14)
285
-
1 (0.14)
78
-
-
-
3 (0.43)
-
Dictyostelium
discoideum
1 (0.14)
-
-
15
-
1 (0.14)
-
Leishmania major
-
-
-
-
-
-
-
Plasmodium
falciparum
-
-
-
-
-
7
-
Thalassiosira
pseudonana
-
-
-
-
-
-
24
Species specific intron positions are shown in the diagonal. Shared intron positions between the different groups of species within the complete multiple protein
alignments are shown above the diagonal, shared intron positions only within conserved regions of the alignments are shown below the diagonal. Percentage of
shared positions is indicated in brackets. 1Homo sapiens, Mus musculus, Rattus norvegicus, Danio rerio, Caenorhabditis elegans, Drosophila melanogaster,
2
Arabidopsis thaliana, Chlamydomonas reinhardtii, Oryza sativa, 3Aspergillus fumigatus, Schizosaccharomyces pombe, Saccharomyces cerevisiae, Candida glabrata,
Yarrowia lipolytica.
the different lineages included in the study (see Methods). Consistent with previous results [36], most intron
positions are shared between the most divergent groups
animals and plants (Tables 1 and 2). The distribution of
the number of species-specific introns reflects the overall intron density in each species. Comparing only those
intron positions in highly-conserved alignment regions
identified with Block Maker [37], the numbers of shared
intron positions are reduced but still show the same
trend (Tables 1 and 2). Only few introns are found at
the same position across more than two groups of
organisms. Three intron positions are shared between
animals, plants and fungi. Also three introns at the
same positions are shared between animals, fungi and
Dictyostelium discoideum. Two shared intron positions
are found in animals, plants and Dictyostelium
discoideum.
The timing of the transfer events reveals independent
transfers in different species, mostly involving the green
alga Chlamydomonas reinhardtii and other groups (Figure 2). For instance, the genes nad7, nad9 and atp1 of
the oxidative phosphorylation, were transferred twice
independently in animals, fungi and in Chlamydomonas
reinhardtii (Figure 2a). The same observation is made
within the timing of gene transfer events of the mitochondrial ribosomal proteins (Figure 2b). Five gene
transfers took place independently in Chlamydomonas
reinhardtii, Leishmania major, Plasmodium falciparum,
and before the split of animals and fungi (rpl2, rpl5,
rpl16, rps4, rps7). A list of putative independent transfers is provided in Table 3.
The large number of independently transferred genes
in the green algal lineage allows us to compare the
occurrence of shared intron positions between genes
transferred independently and those derived from a
common nuclear-encoded ancestor. The observation
that most shared intron positions are found between
distantly-related species can be explained either by conservation of intron positions from a common ancestor
or by parallel intron gain. Different evolutionary models
infer different rates of parallel intron insertion. For Qiu
and colleagues most shared intron positions should be
Table 2 Number of species-specific and shared intron positions in Ribosomal mitochondrial proteins.
Animals1
Animals1
Plants2
Fungi3
Dictyostelium
discoideum
Leishmania
major
Plasmodium
falciparum
Thalassiosira
pseudonana
318
12 (2.49)
4 (0.83)
3 (0.62)
-
-
2 (0.42)
Plants2
6 (1.25)
105
1 (0.21)
-
-
-
-
Fungi3
1 (0.21)
1 (0.21)
17
-
-
-
-
Dictyostelium discoideum
1 (0.21)
-
-
6
-
-
-
Leishmania major
Plasmodium falciparum
-
-
-
-
-
11
-
1 (0.21)
-
-
-
-
-
3
Thalassiosira pseudonana
See legend of table 1 for indications
Ahmadinejad et al. BMC Evolutionary Biology 2010, 10:57
http://www.biomedcentral.com/1471-2148/10/57
Page 7 of 11
Table 3 Genes that are independently transferred and which could be identified with their mitochondrial gene names.
Gene
Independent Transfers
nad7
C. reinhardtii
Animals, Fungi
nad9
C. reinhardtii
Animals, Fungi
nad11
Animals, Fungi
L. major
D. discoideum
rps2
C. reinhardtii
Plants and green alga
A. thaliana
Animals, Fungi
L. major
P. falciparum
rps11
C. reinhardtii
A. thaliana
Animals, Fungi
L. major
P. falciparum
rps14
C. reinhardtii
A. thaliana
Animals, Fungi
L. major
P. falciparum
rpl2
C. reinhardtii
Animals, Fungi
L. major
P. falciparum
rpl16
rpl7
C. reinhardtii
C. reinhardtii
Animals, Fungi
Animals, Fungi
L. major
L. major
P. falciparum
P. falciparum
rpl12
C. reinhardtii
Animals, Fungi
P. falciparum
rpl11
Plants and green alga
Animals, Fungi
L. major
P. falciparum
T. pseudonana
rpl14
Plants and green alga
Animals, Fungi
L. major
P. falciparum
The first three genes (nadx) are genes of the oxidative phosphorylation pathway, the other nine genes (rpsx/rplx) are genes of ribosomal mitochondrial proteins
of the small and the large ribosomal subunit, respectively.
gained independently [31], whereas most other models
provide lower estimates (5-18%) for the fraction of
shared intron positions that result from independent
insertions [23,38,39].
Shared positions between distant species such as animals and plants have been considered ancient positions
[22,23], considering that the probability for an independent gain of two introns at the same position is very
small. Our data, however, show that this is not necessarily the case. In both, the proto-mitochondrial genes
nad7 and nad11 which were independently transferred in
the eukaryotes under consideration (Table 3) and the
gene sdh2 which was transferred in the basal eukaryotic
lineage (Figure 2a; transfer at the root of the tree) shared
intron positions were identified between the green alga
Chlamydomonas reinhardtii and some of the animals
(Homo sapiens, Mus musculus, Rattus norvegicus). A
comparison of the percentage of those shared intron
positions between these groups reveals almost a double
amount of positions in the genes that were transferred
independently (4.38%) in contrast to shared positions in
the other genes (2.52%). This means that at large evolutionary distances shared positions are not always indicative of the prevalence of ancestral intron positions. The
percentage of shared positions between Chlamydomonas
and animals in these genes is remarkably lower than previous reports that set a ~23% of shared introns between
human and Arabidopsis genes [36]. However, it must be
noted that the specific nature and reduced size of our
dataset makes it difficult to extrapolate our findings to a
general case. For the gene nad7, the phylogenetic distribution in nuclear and mitochondrial genomes and its
evolutionary history which includes a parallel intron gain
was reconstructed in detail.
An unambiguous parallel intron gain at identical sites in
two independently transferred nad7 genes
To gain a more detailed insight into the parallel insertion
of introns at identical positions we present here in detail
a particular example from our dataset, that of a parallel
intron in the nad7 gene. The gene nad7 was transferred
independently before the split of animals and fungi and
in the green alga Chlamydomonas reinhardtii and the
only shared intron position was found between animals
and the green alga. To gain a more detailed view of the
evolution of the gene nad7, we added to the phylogenetic
analysis also the mitochondrial encoded homologs of the
two protists Dictyostelium discoideum, Thalassiosira
pseudonana, the plants Arabidopsis thaliana, Oryza
sativa, the green algae Pseudendoclonium akinetum,
Ostreococcus tauri and the moss Physcomitrella patens,
as well as the nuclear encoded nad7 gene of the green
alga Volvox carteri. The presence of the gene nad7 in the
mitochondrial genome in all other plants, the moss and
the two green algae Pseudendoclonium akinetum [40]
and Ostreococcus tauri [41] supports the evolutionary
scenario of independent transfer in the two green algae
Chlamydomonas reinhardtii, Volvox carteri, and the animal/fungi split. These at least two independent transfer
events are also supported by the reconstructed phylogenetic tree that contains both, nuclear and mitochondrial
genes as well as alpha-proteobacterial nad7 homologs as
the outgroup (Figure 4).
The nuclear encoded Chlamydomonas reinhardtii nad7
gene possesses 11 introns. A single shared intron position is
found in a conserved region of the alignment between the
green alga Chlamydomonas reinhardtii and the animals.
The introns are all of phase 0 at exactly the same position
of the gene as shown in Figure 4. In all sequences, the
Ahmadinejad et al. BMC Evolutionary Biology 2010, 10:57
http://www.biomedcentral.com/1471-2148/10/57
Page 8 of 11
Figure 4 A comprehensive phylogeny of the gene nad7 including mitochondrial and nuclear encoded homologs. The tree is rooted by
alpha-proteobacterial homologs. The two independent gene transfers of nad7 from the mitochondrion to the nucleus are labelled at the tree.
The nucleotide region of the shared intron position is shown. The different lengths of the intron sequences are indicated in parentheses, the
splicing sites are marked in bold.
codon before the intron position codes for the amino acid
glutamine. With one exception each, two different codons
are used in the nuclear and the mitochondrial encoded
genes for glutamine. There is a CAA found in the mitochondrial genes and a CAG in the nuclear genes, in agreement with a different average codon usage in mitochondrial
and nuclear genes (Additional file 3). The codon before the
intron together with the first nucleotide G after the intron
correspond to a classical proto-splice site (C|A)AG - (A|G)
(Figure 4) [42].
Interestingly, the shared intron position is surrounded
by two group II introns in the mitochondrial sequences
of the moss Physcomitrella patens and the plants Arabidopsis thaliana and Oryza sativa, 15 codons
upstream and eight codons downstream, respectively
(Additional file 4). Although it might be tempting to
speculate on a possible role of these surrounding
group-II introns in the formation of the spliceosomal
intron after the transfer, the fact that such introns are
rare, if not completely absent in most mitochondrial
genomes, implies that most introns in recently transferred genes have been formed by alternative mechanisms. Altogether, our observations indicate that the
gene nad7 was transferred twice independently and
subsequently adapted its codon usage to that of nuclear
genes. This originated the presence of a proto-splice
site in the sequence of the nad7 gene, which, in turn,
enabled the insertion of an intron at the same position
in the different lineages.
Conclusions
Arguments in favor of intron antiquity at identical
intron positions are generally founded in weighing the
relative probabilities of massive intron loss versus a few
parallel intron gains [14,23]. Although several clear-cut
cases of parallel intron gains have been previously
described [43], this process is still considered a rarity.
Our results present several independent intron gains in
homologous genes that were transferred independently
from the mitochondrion to the nucleus, showing that
independent acquisition of introns have been relatively
frequent in this group of genes. In fact, for the cases we
have examined in more detail, the number of parallel
intron gains is similar to the fraction of conserved
shared positions at the same evolutionary distance.
These results, albeit based on a limited sample of a specific set of genes, indicate that shared intron positions
can, in some instances, arise independently by parallel
insertions in distantly-related lineages.
Methods
Sequence data
All human nuclear encoded genes of the oxidative phosphorylation pathway and mitochondrial ribosomal proteins were obtained from the SwissProt database [44].
Genomic nucleotide and protein sequences of 18 completely sequenced eukaryotes were downloaded from
GenBank [45] and JGI http://www.jgi.doe.gov/ databases
as of March 2007. For both the nucleotide and the protein sequences, local databases were created. Three
plant/green alga genomes were included in the analyses,
Arabidopsis thaliana, Oryza sativa, and Chlamydomonas reinhardtii. Five fungal genomes, Aspergillus fumigatus, Candida glabrata, Saccharomyces cerevisiae,
Schizosaccharomyces pombe, and Yarrowia lipolytica and
six animal genomes, Danio rerio, Drosophila melanogster, Caenorhabditis elegans, Homo sapiens, Mus musculus, and Rattus norvegicus. Four different protist
Ahmadinejad et al. BMC Evolutionary Biology 2010, 10:57
http://www.biomedcentral.com/1471-2148/10/57
genomes were chosen, Thalassiosira pseudonana, Plasmodium falciparum, Leishmania major, and Dictyostelium discoideum (Additional file 2).
Proto-mitochondrial genes
The information about the proto-mitochondrial origin
of the mitochondrial ribosomal proteins was taken from
[25]. The proteins of the oxidative phosphorylation
pathway were downloaded from SwissProt [44] based on
the information for the proteins of complex I [27]. To
assign the corresponding mitochondrial gene names and
to test again the proto-mitochondrial origin of the
human genes of the oxidative phosphorylation pathway
a BLAST [46] search was performed against the genome
of the alpha-proteobacteria Rickettsia prowazekii. If the
search resulted in a significant hit, annotated with a
function in the electron transport chain, a second
BLAST was performed against the mitochondrial genome of the protozoon Reclinomonas americana which
has the largest number of mitochondrial-encoded proteins [47]. With the information about an existing
homolog in Reclinomonas americana, the mitochondrial
gene name could be assigned in some cases to the
human nuclear encoded mitochondrial proteins.
To find eukaryotic homologs of the proto-mitochondrial genes, we used BLAST with each human protein
as query and the protein database consisting of the 18
species. Resulting hits with an e-value < = 1e-06 were
considered. For each set of homologs, a multiple protein
sequence alignment was reconstructed using MUSCLE
[48].
Page 9 of 11
corresponding species. The result of BLAT is an alignment of the protein sequence to the exonic regions in
the genome sequence without overlapping ends, where
putative introns are not aligned. To identify the intron
positions the following filtering steps were implemented
in Perl scripts. The putative intron region had to be
longer than 20 nucleotides and consists of a canonical
splicing site, which means that nucleotides GT and AG
are found at the beginning and the end of the sequence,
respectively. To verify this inference, 18 nucleotides of
the genomic region surrounding the putative splicing
site were translated into protein sequence and compared
with the query protein. If the translation was identical to
the query sequence, the intron positions were identified
together with the phase of each intron. A similar
method for intron identification was recently published
[50].
Comparing intron positions
Presence/absence matrices of introns were built for each
alignment to compare their positions. Shared intron
positions are defined as introns that are found at exactly
the same amino acid within the multiple protein
sequence alignment. In addition, we determined shared
intron positions only in conserved regions of the alignments. Therefore, conserved regions in each alignment
were determined with Block Maker [37], a feature at
Blocks database [51]. The intron density of a gene is
given as the number of introns per 1 kb of coding
sequence.
Phylogenetic reconstruction of the nad7 gene
Timing of endosymbiotic gene transfer events
The presence of genes in the mitochondrial genomes of
the 18 species used in this study and other species was
checked with the mitochondrial gene content tables in
NCBI http://ncbi.nlm.nih.gov. The phylogenetic relationship between the 18 species [34] was used to assign the
relative time of gene transfers regarding to speciation
events. The relative timing of endosymbiotic gene transfer events were specified for both sets of proteins, the
oxidative phosphorylation and the mitochondrial ribosomal proteins. Combining gene presence/absence information with the taxonomic relationships of the species
results in a reconstruction of gene transfer events from
the mitochondrion to the nucleus. Due to the uncertainty in some nodes of the eukaryotic tree and a sparse
presence pattern of some genes in mitochondrial genomes, the timing for several transfer events were considered unresolved.
Identification of intron positions
In the first step, BLAT [49] was used to align the protein sequence to the genomic sequence of the
Protein sequences were aligned with MUSCLE [48], and
all gapped sites were removed. Because the nad7 data
sample includes eukaryotic nuclear sequences, mitochondrial sequences, and prokaryotic sequences, the
phylogenetic reconstruction method has to take into
account different evolutionary rates [52]. Therefore the
ProtTest [53] program was used to estimate which substitution model fits the data best. The program computes maximum likelihood trees using phyml [54] under
different substitution models and outputs the most likely
tree according to different criteria. The maximum likelihood values for the trees are then used to perform a
goodness of fit test with the AICc (Akaike Information
Criterion with a second order correction for small sample sizes [55] and the BIC (Bayesian Information Criterion). In all cases the WAG [56] substitution model with
an estimated proportion of invariable sites and a Γ-distribution (WAG+I+G) was chosen to explain the evolution of nad7 best. Bootstrap values were calculated
using this model with 100 bootstrap replicates. The phylogenetic tree is rooted by the clade of alpha-proteobacterial nad7 genes.
Ahmadinejad et al. BMC Evolutionary Biology 2010, 10:57
http://www.biomedcentral.com/1471-2148/10/57
Page 10 of 11
8.
Additional file 1: Table of all human proto-mitochondrial genes of
the a) oxidative phosphorylation pathway and the b) mitochondrial
ribosome with their SwissProt ID and the corresponding
mitochondrial gene name if it could be identified by BLAST against
the mitochondrial genome of Reclinomonas americana.
Click here for file
[ http://www.biomedcentral.com/content/supplementary/1471-2148-1057-S1.PDF ]
10.
Additional file 2: Database sources of the complete genome and
protein sequences.
Click here for file
[ http://www.biomedcentral.com/content/supplementary/1471-2148-1057-S2.PDF ]
13.
Additional file 3: Percentage of codon usage for the amino acid
glutamine in a) the mitochondrial genomes and in b) the nuclear
genomes of the species that are included in the analysis of the
parallel intron gain in the gene nad7.
Click here for file
[ http://www.biomedcentral.com/content/supplementary/1471-2148-1057-S3.PDF ]
Additional file 4: Part of the multiple protein alignment of the gene
nad7.
Click here for file
[ http://www.biomedcentral.com/content/supplementary/1471-2148-1057-S4.PDF ]
9.
11.
12.
14.
15.
16.
17.
18.
19.
20.
21.
Acknowledgements
TG research is funded in part by grants from the Spanish Ministry of Science
and Innovation (GEN2006-27784-E) and Ministry of Health (CP06/00213). WM
gratefully acknowledges grants from the European Research Council
(Networkorigins) and from the Deutsche Forschungsgemeinshaft (SFB-TR1).
Author details
1
Institut für Botanik III, Heinrich-Heine Universität Düsseldorf, Universitätsstr 1,
40225 Düsseldorf, Germany. 2Max Planck Institute for Plant Breeding
Research, Dept. Plant-Microbe Interactions, Carl-von-Linné-Weg 10, 50829
Köln. Germany. 3Bioinformatics and Genomics Programme, Centre for
Genomic Regulation (CRG), Dr Aiguader, 88 Barcelona 08003, Spain.
22.
23.
24.
25.
Authors’ contributions
NA contributed to acquisition, analysis and interpretation of data and
drafting the manuscript. TD contributed to conception and design of the
work, analysis and interpretation of data and drafting the manuscript. NG
contributed to acquisition, analysis and interpretation of data. WM
contributed to conception and design of the work, interpretation of data
and critical revision of the manuscript. TG contributed to conception and
design of the work, interpretation of data and drafting and critical revision
of the manuscript. All authors read and approved the final version of the
manuscript.
Received: 12 May 2009
Accepted: 23 February 2010 Published: 23 February 2010
26.
27.
28.
29.
30.
References
1. Gilbert W: Why genes in pieces?. Nature 1978, 271(5645):501.
2. Nilsen TW: The spliceosome: the most complex macromolecular machine
in the cell?. Bioessays 2003, 25(12):1147-1149.
3. Collins L, Penny D: Complex spliceosomal organization ancestral to
extant eukaryotes. Mol Biol Evol 2005, 22(4):1053-1066.
4. Gilbert W, Glynias M: On the ancient nature of introns. Gene 1993, 135(12):137-144.
5. Cech TR: The generality of self-splicing RNA: relationship to nuclear
mRNA splicing. Cell 1986, 44(2):207-210.
6. Valadkhan S: The spliceosome: a ribozyme at heart?. Biol Chem 2007,
388(7):693-697.
7. Toor N, Keating KS, Taylor SD, Pyle AM: Crystal structure of a self-spliced
group II intron. Science 2008, 320(5872):77-82.
31.
32.
33.
34.