Skip to content

The GitHub Repo for the hands-on session at NLDL entitled "Tutorial: Innovative Uses of Synthetic Data Tutorial".

License

Notifications You must be signed in to change notification settings

vanderschaarlab/CCAIM-Synthetic-data-tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Synthetic-data-tutorial

Welcome to the hands-on part of the tutorial for Innovative Uses of Synthetic Data! In this repository you will find materials required to complete the Lab.

Before the lab

We will run the hands-on session using Google Colab. As a benefit, there is no need to pre-install any library or download any dataset.

For the in-person participants, please bring with you a fully charged laptop for the session and bear in mind that there may not be enough power sockets for everyone at the venue. Any available sockets are allocated on a first come, first served basis.

The Lab is based on the open-source Python library synthcity. To make the most out of the Lab, we recommend the participants to explore the library beforehand. Here is a list of useful materials:

During the lab

We will cover

Session Title Description
Data Modality We demonstrate how synthcity can generate tabular data with diverse modalities, including static data, regular and irregular time series, data with censoring, multi-source data, and composite data.
Fairness We show how synthetic data can promote ML fairness by (1) augmenting minority classes with conditional generation and (2) removing bias via causal generation.
Privacy We introduce privacy-preserving synthetic data generators that facilitates sharing of sensitive data. We will cover differential-privacy based methods as well as methods that defend against specific threat models.
Transfer We show how to alleviate data scarcity by augmenting a small dataset using information learned from other related datasets in a transfer learning style.
Further Engagement We discuss ways of further engaging with the application and development of synthcity.

The Interactive tutorial for the session is available here.

After the lab

Download and use the library - join the development. Raise issues and open pull requests.

If you've enjoyed the lab, why not Star Synthcity on GitHub. The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we're building.

Sign up to our synthetic data mailing list to stay up to date on news about the SyntheticData4ML community. We will post about upcoming tutorials, workshops, competitions, hackathons and more!

Join our Machine Learning Engagement sessions, "Inspiration Exchange", for discussions of our research projects and software, such as Synthcity. Sign up here.

About

The GitHub Repo for the hands-on session at NLDL entitled "Tutorial: Innovative Uses of Synthetic Data Tutorial".

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published