Link to paper: 10.1038/s41467-022-32818-8
Figure. Predictor-guided generator optimization enables gene-specific navigation of the regulatory sequence-expression landscape. T-distributed stochastic neighbor embedding (t-SNE) mapping of the input latent subspaces that produce novel sequence variants spanning ~6 orders of magnitude of gene expression (colored and black dots), uncovered using the predictor-guided generator optimization. Black dots represent selections of 10 sequence variants per each of the 4 expression groups covering a 4 order-of-magnitude range of predicted expression levels from TPM ~10 to ~10,000.
Arrowsheads were incorrectly rendered and are missing in schematic figures 1a,d, 3a & 6e. The correct panels are available in the docs folder.
Figure 1a. Schematic depiction of sequence data and model training strategies.
Figure 1d. Overview of the generative adversarial network (GAN) approach.
Figure 3a. Schematic depiction of the procedure to optimize the generator.
Figure 6e. Schematic depiction of the mutagenesis strategy.
Scripts for training and optimization of ExpressionGAN as well as to reproduce the analysis are provided in the folder 'scripts'.
The data including generated sequence data are available at , extract the archive to a folder named 'data'.
Software dependencies are specified in the environment files in the 'docs' folder, with env_training.yml used for GAN training and optimization and env_analysis.yml used for the data analysis.