Welcome to the hands-on part of the tutorial for Innovative Uses of Synthetic Data! In this repository you will find materials required to complete the Lab.
We will run the hands-on session using Google Colab. As a benefit, there is no need to pre-install any library or download any dataset.
For the in-person participants, please bring with you a fully charged laptop for the session and bear in mind that there may not be enough power sockets for everyone at the venue. Any available sockets are allocated on a first come, first served basis.
The Lab is based on the open-source Python library synthcity. To make the most out of the Lab, we recommend the participants to explore the library beforehand. Here is a list of useful materials:
- The whitepaper on synthcity and innovative uses of synthetic data
- The NeurIPS 2023 paper
- The GitHub repository of synthcity
- The synthcity documentation
Session Title | Description |
---|---|
Data Modality | We demonstrate how synthcity can generate tabular data with diverse modalities, including static data, regular and irregular time series, data with censoring, multi-source data, and composite data. |
Fairness | We show how synthetic data can promote ML fairness by (1) augmenting minority classes with conditional generation and (2) removing bias via causal generation. |
Privacy | We introduce privacy-preserving synthetic data generators that facilitates sharing of sensitive data. We will cover differential-privacy based methods as well as methods that defend against specific threat models. |
Transfer | We show how to alleviate data scarcity by augmenting a small dataset using information learned from other related datasets in a transfer learning style. |
Further Engagement | We discuss ways of further engaging with the application and development of synthcity. |
The Interactive tutorial for the session is available here.
Download and use the library - join the development. Raise issues and open pull requests.
If you've enjoyed the lab, why not Star Synthcity on GitHub. The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we're building.
Sign up to our synthetic data mailing list to stay up to date on news about the SyntheticData4ML community. We will post about upcoming tutorials, workshops, competitions, hackathons and more!
Join our Machine Learning Engagement sessions, "Inspiration Exchange", for discussions of our research projects and software, such as Synthcity. Sign up here.