Skip to content

Role of genomics on regulating rice grain metabolic variability under warmer nights: A statistical and image-based deep learning approach

Notifications You must be signed in to change notification settings

yebigithub/GBLUP4Met

Repository files navigation

Genomic prediction of metabolic content in rice grain in response to warmer night conditions

Preprint: link

Abstract

It has been argued that metabolic content can be used as a selection marker to accelerate crop improvement because metabolic profiles in crops are often under genetic control. Evaluating the role of genetics in metabolic variation is a long-standing challenge. Rice, one of the world's most important staple crops, is known to be sensitive to recent increases in nighttime temperatures. Quantification of metabolic levels can help measure rice responses to high night temperature (HNT) stress. However, the extent of metabolic variation that can be explained by regression on whole-genome molecular markers remains to be evaluated. In the current study, we generated metabolic profiles for mature grains from a subset of rice diversity panel accessions grown under optimal and HNT conditions. Metabolite accumulation was low to moderately heritable, and genomic prediction accuracies of metabolite accumulation were within the expected upper limit set by their genomic heritability estimates. Genomic heritability estimates were slightly higher in the control group than in the HNT group. Genomic correlation estimates for the same metabolite accumulation between the control and HNT conditions indicated the presence of genotype-by-environment interactions. Reproducing kernel Hilbert spaces regression and image-based deep learning improved prediction accuracy, suggesting that some metabolite levels are under non-additive genetic control. Joint analysis of multiple metabolite accumulation simultaneously was effective in improving prediction accuracy by exploiting correlations among metabolites. The current study serves as an important first step in evaluating the cumulative effect of markers in influencing metabolic variation under control and HNT conditions.

0. Data Preprocessing

  • .Rmd file Including metabolite and genotype data cleaning

1. Genomic heritability of metabolites

  • .R file Using sommer package to calculate heritability for metabolites.
  • .Rmd file Drawing heritability plots.

Figure 2: Genomic heritability estimates of metabolite accumulation in control and high night temperature stress conditions. A) Scatter plot. B) Density plot. Solid and dashed lines indicate mean and median, respectively. C) Agreement of heritability estimates between control and high night temperature stress conditions. Metabolites in green and red colors indicate that the heritability difference between control and high night temperature stress conditions was small (< 0.05) and large (> 0.1)

2. Single-trait genomic prediction of metabolites

  • .R file Running Single trait GBLUP in cluster.
  • .Rmd file Drawing Single trait GBLUP plots.
  • .Rmd file Selecting suitable bandwidth for RKHS.
  • .R file Runing Single trait RKHS in cluster.

Figure 4: Genomic prediction accuracy of metabolite accumulation in control and high night temperature stress conditions. A) Box plot. The horizontal line indicates the mean value. B) Density plot. The solid and dashed lines indicate the mean and median, respectively. C) Agreement of genomic prediction accuracy between control and high night temperature stress conditions. Metabolite accumulations in green and red colors indicate that the genomic prediction difference between control and high night temperature stress conditions was small (< 0.05) and large (> 0.1).

3. Genomic correlation between the same metabolite in different treatments

  • .R file Running multi-trait genomic correlation.
  • .Rmd file Drawing multi-trait genomic correlation plots.

Figure 3: Genomic correlation estimates between the same metabolite accumulation measured under control and high night temperature stress conditions. A) Scatter plot. B) Bar chart. Solid and dashed lines indicate mean and median, respectively.

4. Exporatory factor analysis

  • .Rmd file Factorial analysis to identify underlying latent factors controlling metabolites.

5. Simultaneous regression modeling of metabolites

  • .R file Running MegaLMM for genomic prediction.
  • .R file Running MegaLMM for RKHS.
  • .Rmd file Drawing barplot, density plots for MegaLMM genomic prediction model.
  • .Rmd file Drawing genomic correlation density plot.

Figure 7: Genomic correlation estimates between different metabolite accumulation in control and high night temperature stress conditions. The solid and dashed lines indicate mean and median, respectively.

Figure 8: Percentage difference of gain in prediction accuracy for multi-trait genomic best linear unbiased prediction (MegaLMM-G) and multi-trait reproducing kernel Hilbert spaces regression (MegaLMM-GK) relative to single-trait genomic best linear unbiased prediction (A). Density plots of percentage difference are shown for MegaLMM-G (B) and MegaLMM-GK (C).

6. Deep learning models

  • .ipynb Shows examples about how to convert SNP tabular data into SNP images.
  • .py file Loop converting for SNPs in all chromosomes.
  • .py file Convolutional neural network with multiple branches.
  • .Rmd file Drawing barplot to compare performance of all deep learning models and RKHS.

Figure1: Flowchart of converting single nucleotide polymorphisms to image data



Figure 5: Example of a set of single nucleotide polymorphisms transformed into image data for a randomly selected genotype. Images of 12 chromosomes were processed in the multi-channel convolutional neural networks



Figure 6: Percentage difference of gain in prediction accuracy for single-trait reproducing kernel Hilbert spaces regression (RKHS), VGG16, ResNet50 EfficientNetB7, InceptionV3, MobileNetV2, and DenseNet201 relative to single-trait genomic best linear unbiased prediction.

7. Supplementary

  • .Rmd file Calculating phenotypical correaliton between metabolites in control and stress conditions.
  • .Rmd file Drawing MegaLMM genomic correlation heatmaps.
  • .Rmd file Drawing factorial analysis heatmaps.
  • .Rmd file Drawing factorial analysis density plots.

About

Role of genomics on regulating rice grain metabolic variability under warmer nights: A statistical and image-based deep learning approach

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published