Variational Autoencoders(VAE) and Vanilla Generative Adversial Networks(GAN) for synthetic data generation in molecular life sciences

Insufficient data for training machine learning models can result in issues such as underfitting or overfitting. This is because machine learning models may not have enough statistical power to accurately capture the true signals in the data. Additionally, a lack of knowledge about the ground truth values of real data can impact the interpretability of the models. Therefore, there is a growing need for synthetic data to address these challenges. Modern deep learning methods, including generative models, can be employed to generate synthetic data, which can aid in improving the performance and interpretability of machine learning models. In this study, we compared the performance of generative models namely, Variational Autoencoders (VAEs) and vanilla Generative Adversarial Networks (GANs) in generating synthetic tabular data. While GANs have proven to be successful in generating synthetic image data, they have encountered difficulties when it comes to tabular data. Simulated datasets were used to gain a deeper understanding of generative models, with technical noise and signals introduced in the form of batches and groups. VAEs outperformed vanilla GANs in capturing all the complex patterns, including the most diverse genes introduced during the data simulation. This project is done as a part of machine learning course in my masters curriculum.

The "R_script_to_generate_data.R" consists of the script which is used to simulate the data.
"DATASETS" folder consists of the 10 simulated datasets.
The "VAE_GAN_SYNTH_DATA_GENERATION" file is the code which can be used to reproduce the the results.
The performance of VAE in terms of Precision and Recall:
The performance of Vanilla GAN in terms of Precision and Recall:

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
DATASETS		DATASETS
LICENSE		LICENSE
README.md		README.md
R_script_to_generate_data.R		R_script_to_generate_data.R
VAE_GAN_SYNTH_DATA_GENERATION_CODE.ipynb		VAE_GAN_SYNTH_DATA_GENERATION_CODE.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Variational Autoencoders(VAE) and Vanilla Generative Adversial Networks(GAN) for synthetic data generation in molecular life sciences

License

About

Releases

Packages

Languages

License

Sowgandh6/Generative-models-for-synthetic-data-in-molecular-life-sciences

Folders and files

Latest commit

History

Repository files navigation

Variational Autoencoders(VAE) and Vanilla Generative Adversial Networks(GAN) for synthetic data generation in molecular life sciences

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages