Access to synthetic data helps you make better, data-informed decisions in situations where you have imbalanced, scant, poor quality, unobservable, or restricted data. This demo helps you create a synthetic data set based on a census data set.
We shall be using data from the Adult Census Bureau database, made available here and released under licence CC-0. This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics).
The notebook provides contains steps to pull the data.
-
An Active SAS Viya Workbench license and environment.
-
Python packages required:
- pmlb - used for fetching datasets from the Penn Machine Learning Benchmark.
- tqdm - for progress bar functionality during iterations
Install packages using the following command:
pip install pmlb tqdm
The Python notebook can be run as-is.
The output is a synthetically generated dataset which you can choose to persist to the output folder contained in this folder.
- Matt Gampe ([email protected])
- Version 1.0.0 (21NOV2024)
- Initial release on GitHub