Welcome to the AAAI-23 Lab for Innovative Uses of Synthetic Data! In this repository you will find materials required to complete the Lab.
The Lab will start on Wednesday, 8 February 2023, 14:00 EST. It will be a four-hour hybrid event. Both physical and online participants need to register on the AAAI website to join the live session.
We will run the hands-on session using Google Colab. As a benefit, there is no need to pre-install any library or download any dataset.
For the physical participants, please bring with you a fully charged laptop for the four-hour session. The event host has notified us that there won't be enough power sockets for everyone at the venue. And these sockets are allocated on a first come, first served basis.
The Lab is based on the open-source Python library synthcity. To make the most out of the Lab, we recommend the participants to explore the library beforehand. Here is a list of useful materials:
- The whitepaper on synthcity and innovative uses of synthetic data
- The GitHub repository of synthcity
- The synthcity documentation
Please note that all times are reported in EST (UTC-05:00).
Start | End | Session Title | Description |
---|---|---|---|
2:00pm | 2:30pm | Opening and Intro | We go through the promise of synthetic data in empowering AI development and the associated challenges. |
2:30pm | 3:15pm | Data Modality | We demonstrate how synthcity can generate tabular data with diverse modalities, including static data, regular and irregular time series, data with censoring, multi-source data, and composite data. |
3:15pm | 3:30pm | Q&A | |
3:30pm | 4:00pm | Break | |
4:00pm | 4:30pm | Fairness | We show how synthetic data can promote ML fairness by (1) augmenting minority classes with conditional generation and (2) removing bias via causal generation. |
4:30pm | 5:00pm | Privacy | We introduce privacy-preserving synthetic data generators that facilitates sharing of sensitive data. We will cover differential-privacy based methods as well as methods that defend against specific threat models. |
5:00pm | 5:30pm | Transfer | We show how to alleviate data scarcity by augmenting a small dataset using information learned from other related datasets in a transfer learning style. |
5:30pm | 5:45pm | Q&A | |
5:45pm | 6:00pm | Further Engagement | We discuss ways of further engaging with the application and development of synthcity. |
The Interactive tutorials are available here.
Download and use the library - join the development. Raise issues and open pull requests.
If you've enjoyed the lab, why not Star Synthcity on GitHub. The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we're building.
Sign up to our synthetic data mailing list to stay up to date on news about the SyntheticData4ML community. We will post about upcoming tutorials, workshops, competitions, hackathons and more!
Join our Machine Learning Engagement sessions, "Inspiration Exchange", for discussions of our research projects and software, such as Synthcity. Sign up here.