Workshop on Differential Private Statistics Release

Differential privacy is a rigorous mathematical definition of privacy. An algorithm (such as computing a data's mean, sum, count, etc.) is said to be differentially private if by looking at the output, one cannot tell whether any individual's data was included in the original dataset or not. In other words, the guarantee of a differentially private algorithm is that its behavior hardly changes when a single individual joins or leaves the dataset -- providing individuals with plausible deniability synonymous to privacy.

In this workshop you will learn how to generate and release basic statistical outcomes in differentially private manner. We will specifically release the following queries that the data analyst requested from the trusted data curator/holder. The data curator holds a survey data on researchers (sythetically generated).

Count of researchers
Sum of researchers' income
Mean of researchers' income
Count of researchers by sector
Count of researchers by sector and academic degree

The data curator values the privacy of thier survey participants, so they want to make sure that the data they release does not reveal anything about specific individuals. This is a perfect use case for differential privacy: it will allow us to publish useful insights about groups, while protecting data about individuals.

We will also visualise and learn the impact of the below variables on the accuracy of the queries:

Epsilon(ε): The privacy loss incurred by researchers in the dataset. Larger values indicate less privacy and more accuracy.
Sensitivity: The worst case change in a query's output when a row is removed/added. Noise scales with the sensitivity of a query.
Clamping bounds: Clipping the raw values to the set lower and upper bounds. Noise scales with the size of the bounds.
Dataset size: With a larger dataset size, noise cancels out, improving accuracy.

You will also be introduced to the concepts of parallel composition and post-processing.

Additionally, we have prepared this notebook for you to visualise and understand the basic mechanisms (algorithms) -- Laplacian and Gaussian. The underlying distributions, Laplace and Gaussian, satisfy the differential privacy definition (pure and approximate, respectively) and add randomised noise to the statistics. Further, you will gain an understanding of how privacy parameters affect the noise scale (variance of a distribution).

Also, please refer to this notebook if you want to create a synthetic dataset.

For this workshop, participants are only required to have some basic python programming knowledge.

Getting started 🚀

You can either use the online colab notebook here [add link] or set up a local Jupyter notebook by following the steps below.

Install jupyter notebook

pip install jupyterlab

Git clone the repository

git clone https://github.com/anshu-gt/STACK_2022_differential_privacy_workshop

Install dependencies

pip install -r requirements.txt

Run jupyter notebook in terminal

jupyter lab

The code has been developed with python 3.8.

Resources 📚

These are some of the resources that can be of help to you:

Articles and videos:

Papers

Courses

Differential Privacy course by Gautam Kamath of University of Waterloo

Differential Privacy Libraries

Contributor 🤓

@anshu-gt (contact: [email protected])

💪 by Data Privacy Protection Capability Centre by GovTech

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
images		images
.gitignore		.gitignore
README.md		README.md
generate_data.ipynb		generate_data.ipynb
mechanisms.ipynb		mechanisms.ipynb
requirements.txt		requirements.txt
reseachers_survey_stats_release.ipynb		reseachers_survey_stats_release.ipynb
whitepaper_on_differential_privacy_v1_11th_may_2023.pdf		whitepaper_on_differential_privacy_v1_11th_may_2023.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Workshop on Differential Private Statistics Release

Getting started 🚀

Resources 📚

Articles and videos:

Papers

Courses

Differential Privacy Libraries

Contributor 🤓

About

Releases

Packages

Languages

dsaidgovsg/stack-2022-differential-privacy-workshop

Folders and files

Latest commit

History

Repository files navigation

Workshop on Differential Private Statistics Release

Getting started 🚀

Resources 📚

Articles and videos:

Papers

Courses

Differential Privacy Libraries

Contributor 🤓

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages