diff --git a/papers/Gagnon_Kebe_Tahiri/main.tex b/papers/Gagnon_Kebe_Tahiri/main.tex index 6bd6788fcb..f400943814 100644 --- a/papers/Gagnon_Kebe_Tahiri/main.tex +++ b/papers/Gagnon_Kebe_Tahiri/main.tex @@ -124,22 +124,22 @@ \subsubsection{Environmental data} \item Since the sedimentary characteristics directly influence the distribution of Cumacea \citep{uhlir_adding_2021}, they were included in our data. In this study, they are divided into six ecological niche categories: mud ($n=30$), sandy mud ($n=15$), sand ($n=9$), forams ($n=3$), muddy sand ($n=3$), and gravel ($n=2$). \end{itemize} -\subsubsection{Climatic data} -Wind speed (m/s) (see Figure \ref{fig:fig2}d) at the start and end of sampling and wind direction at the end of sampling were also included, giving the contribution of wind to benthic ecosystem dynamics and the restructuring of species distribution by wind currents and sediment transport \citep{siedlecki2016experiments, waga_recent_2020,saeedi_environmental_2022}. The wind direction at the end of sampling comprises eight orientations: south (S, $n=15$), southwest (SW, $n=15$), northeast (NE, $n=9$), west-southwest (WSW, $n=7$), southeast (SE, $n=6$), north-northwest (NNW, $n=5$), south-southeast (SSE, $n=3$), and east (E, $n=2$). +\subsubsection{Climatic data} +Wind speed (m/s) (see Figure \ref{fig:fig2}d) at the start and end of sampling and wind direction at the end of sampling were also included, giving the contribution of wind to benthic ecosystem dynamics and the restructuring of species distribution by wind currents and sediment transport \citep{siedlecki2016experiments, waga_recent_2020,saeedi_environmental_2022}. The wind direction at the end of sampling comprises eight orientations: south (S, $n=15$), southwest (SW, $n=15$), northeast (NE, $n=9$), west-southwest (WSW, $n=7$), southeast (SE, $n=6$), north-northwest (NNW, $n=5$), south-southeast (SSE, $n=3$), and east (E, $n=2$). \subsection{Selected attributes in the BOLDSystem database} -\subsubsection{Taxonomic data} +\subsubsection{Taxonomic data} The family, genus, and scientific name of the Cumacea sampled were integrated into our data to study evolutionary relationships and genetic variation to habitat attributes among the specimens in our dataset. These comprise seven families: Diastylidae ($n=21$), Lampropidae ($n=13$), Leuconidae ($n=12$), Astacidae ($n=7$), Bodotriidae ($n=4$), Ceratocumatidae ($n=3$), and Pseudocumatidae ($n=2$). A total of 20 Cumacea species were found in our sample (see Figure \ref{fig:fig3}). We have also included the sample identity (id) so that each sample remains unique. Some specimens were only identified to genus ($n=1$) or family ($n=5$). -\subsection{Selected attributes from article \cite{uhlir_adding_2021}} -\subsubsection{Other environmental data} +\subsection{Selected attributes from article \cite{uhlir_adding_2021}} +\subsubsection{Other environmental data} The habitat and water mass of the sampling points were the only water attributes taken directly from Table 1 of \citep{uhlir_adding_2021}, as they can give us insight into how they may affect Cumacea genetic diversity and the acclimatization of these species in the GIN seas around Iceland. Thus, the water masses definitions, as described in \citep{uhlir_adding_2021}, were used as a reference: Arctic Polar Water (APW, $n=15$), Iceland Sea Overflow Water (ISOW, $n=15$), North Atlantic Water (NAW, $n=9$), warm Norwegian Sea Deep Water (NSDWw, $n=8$), Arctic Polar Water/Norwegian Sea Arctic Intermediate Water (APW/NSAIW, $n=7$), Labrador Sea Water (LSW, $n=3$), cold Norwegian Sea Deep Water (NSDWc, $n=3$), and Norwegian Sea Arctic Intermediate Water (NSAIW, $n=2$) (see Figure \ref{fig:fig4}). In terms of habitat, we considered the three categories used in \citep{uhlir_adding_2021}: Deep Sea ($n=38$), Shelf ($n=15$), and Slope ($n=9$) (see Figure \ref{fig:fig5}). -\subsubsection{Genetic data} +\subsubsection{Genetic data} To better understand the relationships between benthic species and their evolutionary responses, genetic data are required \citep{uhlir_adding_2021}. Thus, the aligned partial DNA sequence of the 16S rRNA mitochondrial gene region of each sample was included in our analyses. This region is standard in phylogeny and phylogeography studies \citep{hugenholtz1998impact} and sufficiently conserved over time to guarantee exact alignments between different species or populations \citep{saccone1999evolutionary}. We examined 62 of the 306 aligned DNA sequences used for phylogeographic analyses by \citep{uhlir_adding_2021}. As some specimens in our sample have their DNA sequence duplicated, or even quadruplicated with a difference of one or two nucleotides, we took into account the longest-aligned DNA sequence of each specimen. \subsection{{\textit{aPhyloGeo} software}\label{aPhyloGeo-software}} -Developed by My-Linh Luu, Georges Marceau, David Beauchemin, and Nadia Tahiri, we used the cross-platform Python software \textit{aPhyloGeo} for our phylogeographic analyses, designed to analyze phylogenetic trees using ecological and geographic attributes (\autoref{lst:main}), enabling us to understand the evolution of species under different environmental conditions \citep{koshkarov_phylogeography_2022}. +Developed by My-Linh Luu, Georges Marceau, David Beauchemin, and Nadia Tahiri, we used the cross-platform Python software \textit{aPhyloGeo} for our phylogeographic analyses, designed to analyze phylogenetic trees using ecological and geographic attributes (\autoref{lst:main}), enabling us to understand the evolution of species under different environmental conditions \citep{koshkarov_phylogeography_2022}. We selected this software for our analysis because, to our knowledge, it is the first phylogeographic tool capable of establishing similarity or dissimilarity between species genetics and environmental, climatic, and geographical attributes \citep{koshkarov_phylogeography_2022} - precisely the objective of our study. The \textit{aPhyloGeo} software offers several key functionalities: @@ -209,26 +209,26 @@ \subsection{{\textit{aPhyloGeo} software}\label{aPhyloGeo-software}} \begin{enumerate} \item \textbf{The first step} was to collect DNA sequences from Cumacea of sufficient quality for the needs of our results \citep{koshkarov_phylogeography_2022}. In this study, 62 Cumacea samples were selected to represent 62 partial sequences of the 16S rRNA mitochondrial gene. We then included, from our database, two climatic attributes, namely wind speed (m/s) at the start and end of the sampling; three environmental characteristics, such as depth (m) at the start of sampling, water temperature ($^\circ$C), and O\textsubscript{2} concentration (mg/L); and two geographic variables, latitude (DD) at the end of sampling and longitude (DD) at the start of sampling. -\item \textbf{In the second step}, trees were generated separately from biological, spatial, meteorological, and genetic data. Concerning spatial attributes, we calculated the dissimilarity between each pair of Cumacea from distinct spatial conditions \citep{koshkarov_phylogeography_2022}. This produced a symmetrical square matrix \citep{koshkarov_phylogeography_2022}. The {neighbor-joining algorithm}\footnote{It is a method used to construct phylogenetic trees using distance matrices.} was used to build the spatial tree from this matrix \citep{koshkarov_phylogeography_2022}. Each geographic attribute gives rise to a tree. If there are $m$ windows affected by this attribute, there will be $m$ geographic trees. The same approach was applied to biological, meteorological, and genetic data. +\item \textbf{In the second step}, trees were generated separately from biological, spatial, meteorological, and genetic data. Concerning spatial attributes, we calculated the dissimilarity between each pair of Cumacea from distinct spatial conditions \citep{koshkarov_phylogeography_2022}. This produced a symmetrical square matrix \citep{koshkarov_phylogeography_2022}. The {neighbor-joining algorithm}\footnote{It is a method used to construct phylogenetic trees using distance matrices.} was used to build the spatial tree from this matrix \citep{koshkarov_phylogeography_2022}. Each geographic attribute gives rise to a tree. If there are $m$ windows affected by this attribute, there will be $m$ geographic trees. The same approach was applied to biological, meteorological, and genetic data. -For the genetic data, phylogenetic reconstruction was repeated to build genetic trees based on 62 partial sequences of the 16S rRNA mitochondrial gene, considering only data within a window that progresses along the alignment \citep{koshkarov_phylogeography_2022}. This displacement can vary according to the steps and the size of the window defined by the user (their length is determined by the number of base pairs (bp)) \citep{koshkarov_phylogeography_2022}. +For the genetic data, phylogenetic reconstruction was repeated to build genetic trees based on 62 partial sequences of the 16S rRNA mitochondrial gene, considering only data within a window that progresses along the alignment \citep{koshkarov_phylogeography_2022}. This displacement can vary according to the steps and the size of the window defined by the user (their length is determined by the number of base pairs (bp)) \citep{koshkarov_phylogeography_2022}. In our case, we set up the \textit{aPhyloGeo} software as follows: $pairwiseAligner$ for sequence alignment; $\text{Hamming distance}$ to measure simple dissimilarities between sequences of identical length; $\text{Wider Fit by elongating with Gap (starAlignment)}$ algorithm takes alignment gaps into account, which is often mandatory in the case of major deletions or insertions in the sequences; $\text{windows\_size}$: 1 nucleotide (nt); and finally, $\text{step\_size}$: 10 nt. The last two configurations imply that for each 1 nt window, a phylogenetic tree is produced using the nucleotide of each Cumacea, then the window is moved by 10 nt, creating a new tree. Each window in the alignment will give a genetic tree. If there are $n$ windows, there will be $n$ phylogenetic trees. Genetic trees will be used in an object called $T_1$, while spatial and ecological trees are used in another object called $T_2$. -\item \textbf{In the third step}, the genetic trees constructed in each sliding window are compared with ecosystemic, atmospheric, and regional trees using Robinson-Foulds distance \citep{robinson_comparison_1981}, normalized Robinson-Foulds distance and Euclidean distance. These contribute to understanding the correspondence between Cumacea genetic sequences and their habitat. The approach also takes bootstrapping into account \citep{koshkarov_phylogeography_2022}. The results of these metrics were obtained using the functions $robinson\_foulds(tree1, tree2)$ and $euclidean\_dist(tree1, tree2)$ from the \textit{aPhyloGeo} software and were organized by the main function (\autoref{lst:main}). Those for the normalized Robinson-Foulds distance were obtained with the function $robinson\_foulds(tree1, tree2)$ (see the last line of code in \autoref{lst:robinsonFoulds}). The metric output tells us which of our attributes has the greatest divergence of phylogenetic relationships in our samples, based on the magnitude of the metric distances (see Figure \ref{fig:fig6} and Figure \ref{fig:fig7}). +\item \textbf{In the third step}, the genetic trees constructed in each sliding window are compared with ecosystemic, atmospheric, and regional trees using Robinson-Foulds distance \citep{robinson_comparison_1981}, normalized Robinson-Foulds distance and Euclidean distance. These contribute to understanding the correspondence between Cumacea genetic sequences and their habitat. The approach also takes bootstrapping into account \citep{koshkarov_phylogeography_2022}. The results of these metrics were obtained using the functions $robinson\_foulds(tree1, tree2)$ and $euclidean\_dist(tree1, tree2)$ from the \textit{aPhyloGeo} software and were organized by the main function (\autoref{lst:main}). Those for the normalized Robinson-Foulds distance were obtained with the function $robinson\_foulds(tree1, tree2)$ (see the last line of code in \autoref{lst:robinsonFoulds}). The metric output tells us which of our attributes has the greatest divergence of phylogenetic relationships in our samples, based on the magnitude of the metric distances (see Figure \ref{fig:fig6} and Figure \ref{fig:fig7}). In addition to identifying the specific attribute, a sliding-window approach enables the precise localization of subtle sequences with high rates of genetic mutation \citep{koshkarov_phylogeography_2022}. This method requires shifting a fixed-size window over the alignment of genetic sequences, allowing phylogenetic trees to be reconstructed for each part of the sequence. It therefore allows us to recognize changes in evolutionary relationships along the partial sequence region of the 16S rRNA mitochondrial gene of Cumacea species. This method is essential for determining whether Cumacea-specific gene sequences in this region of their genome may be affected by certain ecological or spatial attributes of their habitat (see Figure \ref{fig:fig6} and Figure \ref{fig:fig7}). \end{enumerate} \subsection{Metrics}\label{metrics} -Our phylogeographic study used four distance measures to quantify differences between phylogenetic trees and habitat trees and assess dissimilarities between genetic sequences and our parameters. This enables a comprehensive analysis of the evolutionary dynamics of Cumacea populations in different environmental contexts. +Our phylogeographic study used four distance measures to quantify differences between phylogenetic trees and habitat trees and assess dissimilarities between genetic sequences and our parameters. This enables a comprehensive analysis of the evolutionary dynamics of Cumacea populations in different environmental contexts. The following section presents a more concise version of the functions mentioned in the second and third steps of \autoref{aPhyloGeo-software}: \subsubsection{Robinson-Foulds distance}\label{RF} The Robinson-Foulds (RF) distance calculates the distance between phylogenetic trees built in each sliding window ($T_1$) and the attributes trees ($T_2$) (see the list in the first step of the \autoref{aPhyloGeo-software}) \citep{tahiri2018new, koshkarov_phylogeography_2022}. This measurement is used to evaluate the topological differences between the two sets of trees (see Equation \eqref{eq:rf} and \autoref{lst:robinsonFoulds}). -For example, it evaluates the number of division differences between phylogenetic trees built within certain user-defined sliding windows (see the second step of the \autoref{aPhyloGeo-software}) and geographic trees built with latitude data (DD) at the end of sampling \citep{robinson_comparison_1981}. A high distance between a specific window and other windows considered in the RF distance analysis may imply that the habitat feature has little to no impact on the evolution of this particular DNA sequence and that the fluctuation of this attribute might not explain the genetic divergences observed. +For example, it evaluates the number of division differences between phylogenetic trees built within certain user-defined sliding windows (see the second step of the \autoref{aPhyloGeo-software}) and spatial trees built with latitude data (DD) at the end of sampling \citep{robinson_comparison_1981}. A high distance between a specific window and other windows considered in the RF distance analysis may imply that the habitat feature has little to no impact on the evolution of this particular DNA sequence and that the fluctuation of this attribute might not explain the genetic divergences observed. \begin{equation}\label{eq:rf} \text{RF}(T_1, T_2) = | \Sigma(T_1) \Delta \Sigma(T_2) | @@ -275,7 +275,7 @@ \subsubsection{Robinson-Foulds distance}\label{RF} \end{lstlisting} \subsubsection{Normalized Robinson-Foulds distance}\label{RFnorm} -The normalized Robinson-Foulds (nRF) distance scales the RF distance to account for the size variations in the trees (number of clades; i.e., a group of species with a common origin), allowing a more equitable comparison. It scales the distance to a range between 0 and 1. In our context, the distance has been normalized by $2n-6$, where $n$ represents the number of taxa (see Equation \eqref{eq:rf_norm} and the last line of code in \autoref{lst:robinsonFoulds}). +The normalized Robinson-Foulds (nRF) distance scales the RF distance to account for the size variations in the trees (number of clades; i.e., a group of species with a common origin), allowing a more equitable comparison. It scales the distance to a range between 0 and 1. In our context, the distance has been normalized by $2n-6$, where $n$ represents the number of taxa (see Equation \eqref{eq:rf_norm} and the last line of code in \autoref{lst:robinsonFoulds}). Since the size of environmental trees constructed with O\textsubscript{2} concentration data (mg/L) differs from that of other attributes due to missing data, this nRF distance allows us to compare its dissimilarity with the phylogenetic trees in a fairer way \citep{tahiri2018new, koshkarov_phylogeography_2022}. It reveals the relative influence of O\textsubscript{2} concentration (mg/L) on Cumacea phylogenetic relationships, independent of tree size \citep{tahiri2018new, koshkarov_phylogeography_2022}. A high value of this metric between a specific window and other windows considered in the nRF distance analysis suggests that we cannot conclude that there is a correlation between this DNA sequence and the attribute. It may indicate a topological dissimilarity between the habitat attribute tree and the gene trees at that position in the DNA sequence alignments. @@ -319,21 +319,21 @@ \subsubsection{Euclidean distance}\label{euclidean} # Load the first tree from Newick format into a dendropy Tree object # Analyzes the string formatted by Newick and prepares the tree for comparison. tree1_tc = dendropy.Tree.get( - data=tree1.format("newick"), - schema="newick", + data=tree1.format("newick"), + schema="newick", taxon_namespace=tns ) # Load the second tree from Newick format into a dendropy Tree object # Similar to the first tree, this step prepares the second tree for comparison. tree2_tc = dendropy.Tree.get( - data=tree2.format("newick"), - schema="newick", + data=tree2.format("newick"), + schema="newick", taxon_namespace=tns ) # Encode the bipartitions of both trees - # This step converts the trees into a format where the presence or absence of + # This step converts the trees into a format where the presence or absence of # Each bipartition (split) is coded, which is necessary to calculate distances. tree1_tc.encode_bipartitions() tree2_tc.encode_bipartitions() @@ -377,7 +377,7 @@ \section{Results}\label{results} \caption{Cumacea frequency distribution by species and family. The percentages (\%) displayed above the bars indicate the relative abundance of each species in the total sample. Unlike less common species, those that are abundant (such as \emph{Leptostylis ampullacea} and \emph{Leucon pallidus}) may have adaptive characteristics that enable them to exploit resources more easily, resist interspecific competition or withstand changing biological conditions. \label{fig:fig3}} \end{figure} -The distribution and diversity of the various Cumacea species found in our sample are shown in Figure \ref{fig:fig3}. It shows that the most represented species are \emph{Leptostylis ampullacea} (14.1\%) and \emph{Leucon pallidus} (12.5\%). In contrast, species like \emph{Bathycuma brevirostre} and \emph{Styloptocuma gracillimum} are less represented (1.6\%), implying that some species may have restricted ecological niches or face ecological forces that limit their distribution. The dominance of certain species (such as \emph{Leptostylis ampullacea} and \emph{Leucon pallidus}) suggests that they may have adaptive traits that enable them to make the most of the accessible resources, resist interspecific competition, or survive in fluctuating ecosystemic conditions, aligns with our study’s aim of relating genetic adaptation to habitat characteristics. +The distribution and diversity of the various Cumacea species found in our sample are shown in Figure \ref{fig:fig3}. It shows that the most represented species are \emph{Leptostylis ampullacea} (14.1\%) and \emph{Leucon pallidus} (12.5\%). In contrast, species like \emph{Bathycuma brevirostre} and \emph{Styloptocuma gracillimum} are less represented (1.6\%), implying that some species may have restricted ecological niches or face ecological forces that limit their distribution. The dominance of certain species (such as \emph{Leptostylis ampullacea} and \emph{Leucon pallidus}) suggests that they may have adaptive traits that enable them to make the most of the accessible resources, resist interspecific competition, or survive in fluctuating ecosystemic conditions, aligns with our study’s aim of relating genetic adaptation to habitat characteristics. \begin{figure}[htbp] \centering @@ -385,7 +385,7 @@ \section{Results}\label{results} \caption{Distribution of Cumacea families by water mass. This histogram represents the frequency of occurrence of the different Cumacea families in our samples, classified according to the water mass in which they were collected. Eight water mass categories are represented: Arctic Polar Water (APW), Arctic Polar Water/North Sub-Arctic Intermediate Water (APW/NSAIW), Iceland Scotland Overflow Water (ISOW), Labrador Sea Water (LSW), North Atlantic Water (NAW), North Sub-Arctic Intermediate Water (NSAIW), cold North Sub-Atlantic Deep Water (NSDWc), and warm North Sub-Atlantic Deep Water (NSDWw). Seven families are represented: Astacidae (red), Bodotriidae (brown), Ceratocumatidae (green), Diastylidae (turquoise), Lampropidae (blue), Leuconidae (purple), and Pseudocumatidae (pink). The presence of the Diastylidae (turquoise) family in the majority of water bodies (APW, APW/NSAIW, ISOW, NSAIW, NSDWc, and NSDWw) accentuates the resilience and ecological acclimatization of this family to various ecological niches and conditions. \label{fig:fig4}} \end{figure} -The following figure supports the objective of our study by showing the distribution of the various Cumacea families in the different water bodies (see Figure \ref{fig:fig4}). The Diastylidae family, for example, is the most common in all water bodies (turquoise color in Figure \ref{fig:fig4}), testifying to its resilience and ecological adaptability to a wide variety of habitat conditions, reminiscent of the dominance of \emph{Leptostylis ampullacea} (see Figure \ref{fig:fig3}, 14.1\%) which belongs to the Diastylidae family. +The following figure supports the objective of our study by showing the distribution of the various Cumacea families in the different water bodies (see Figure \ref{fig:fig4}). The Diastylidae family, for example, is the most common in all water bodies (turquoise color in Figure \ref{fig:fig4}), testifying to its resilience and ecological adaptability to a wide variety of habitat conditions, reminiscent of the dominance of \emph{Leptostylis ampullacea} (see Figure \ref{fig:fig3}, 14.1\%) which belongs to the Diastylidae family. \begin{figure}[] \centering @@ -416,7 +416,7 @@ \section{Conclusion}\label{conclusion} The novelty in our research lies in the exhaustive divergence between habitat attributes and genetic mutability in Cumacea, particularly in identifying genetic windows associated with habitat fluctuations, which has not been widely investigated in previous studies \citep{manel2003landscape, vrijenhoek2009cryptic}. In this case, our integrated method identifies specific genetic regions sensitive to ecosystemic and atmospheric variations. Thus, by seeking to determine which of these two attributes diverges most with the DNA sequences, the eventual identification of proteins linked to one of these variable DNA sequences will make it possible to represent its functional effects in responses to habitat changes. Our future research will focus on verifying the prediction of this protein and assessing its role in the physiological adaptation of Cumacea to fluctuating conditions, adding a link between genetic data and ecological function. -Interpreting how marine invertebrates genetically adapt to variations in their habitat can help us better predict their responses to climate change and advance conservation plans to protect them. Identifying the specific attributes that influence the genetic variability of Cumacea can contribute to the designation and supervision of marine protected areas, assuring they include habitats crucial to the survival and acclimatization of these species. Thus, our results can inform the management of fishing and seabed mining companies by revealing ecologically vulnerable areas where these disturbances can seriously affect benthic biodiversity. +Interpreting how marine invertebrates genetically adapt to variations in their habitat can help us better predict their responses to climate change and advance conservation plans to protect them. Identifying the specific attributes that influence the genetic variability of Cumacea can contribute to the designation and supervision of marine protected areas, assuring they include habitats crucial to the survival and acclimatization of these species. Thus, our results can inform the management of fishing and seabed mining companies by revealing ecologically vulnerable areas where these disturbances can seriously affect benthic biodiversity. Furthermore, our results provide essential knowledge to guide future studies on the genetic adaptation of Cumacea and other invertebrates to ecological and regional variability. Based on these findings, future research should focus on additional ecosystemic and meteorological attributes, such as nutrient accessibility, water pH, ocean currents, and the degree of human disturbance, to further improve the interpretation of the complex interactions between genetics and the environment. Broadening the scope of application to other marine species, not just marine invertebrates, and diverse geographic regions would allow us to generalize the results more effectively. With this in mind, longitudinal study models on these species could reflect long-term climatic and biological fluctuations and improve our knowledge of the dynamics of genetic acclimatization.