A permissive synthetic data library from Gretel.ai
If you want to quickly discover gretel-synthetics, simply click the button below and follow the tutorials!
Check out additional examples here.
This section will guide you through installation of gretel-synthetics
and dependencies that are not directly installed by the Python package manager.
By default, we do not install certain core requirements, the following dependencies should be installed external to the installation
of gretel-synthetics
, depending on which model(s) you plan to use.
- Torch: Used by Timeseries DGAN and ACTGAN (for ACTGAN, Torch is installed by SDV), we recommend version 2.0
- SDV (Synthetic Data Vault): Used by ACTGAN, we recommend version 0.17.x
These dependencies can be installed by doing the following:
pip install sdv<0.18 # for ACTGAN
pip install torch==2.0 # for Timeseries DGAN
To install the actual gretel-synthetics
package, first clone the repo and then...
pip install -U .
or
pip install gretel-synthetics
then...
pip install jupyter
jupyter notebook
When the UI launches in your browser, navigate to examples/synthetic_records.ipynb
and get generating!
If you want to install gretel-synthetics
locally and use a GPU (recommended):
- Create a virtual environment (e.g. using
conda
)
conda create --name tf python=3.9
- Activate the virtual environment
conda activate tf
- Run the setup script
./setup-utils/setup-gretel-synthetics-tensorflow24-with-gpu.sh
The last step will install all the necessary software packages for GPU usage, tensorflow=2.8
and gretel-synthetics
.
Note that this script works only for Ubuntu 18.04. You might need to modify it for other OS versions.
The timeseries DGAN module contains a PyTorch implementation of a DoppelGANger model that is optimized for timeseries data. Similar to tensorflow, you will need to manually install pytorch:
pip install torch==1.13.1
This notebook shows basic usage on a small data set of smart home sensor readings.
ACTGAN (Anyway CTGAN) is an extension of the popular CTGAN implementation that provides some additional functionality to improve memory usage, autodetection and transformation of columns, and more.
To use this model, you will need to manually install SDV:
pip install sdv<0.18
Keep in mind that this will also install several dependencies like PyTorch that SDV relies on, which may conflict with PyTorch versions installed for use with other models like Timeseries DGAN.
The ACTGAN interface is a superset of the CTGAN interface. To see the additional features, please take a look at the ACTGAN demo notebook in the examples
directory of this repo.