Ektelo is an operator-based framework for implementing privacy algorithms. It was first presented at SIGMOD 2018:
- Dan Zhang, Ryan McKenna, Ios Kotsogiannis, Michael Hay, Ashwin Machanavajjhala, and Gerome Miklau. 2018. EKTELO: A Framework for Defining Differentially-Private Computations. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). ACM, New York, NY, USA, 115-130. DOI: https://doi.org/10.1145/3183713.3196921
In the documentation below, this is referred to as the "Ektelo paper."
Licensed under Apache License, Version 2.0.
There are two complementary objectives of the Ektelo project:
- Isolate private interactions with data in a compact and secure kernel.
- Modularize privacy-related algorithms into operators, which promote code reuse and assist in keeping the kernel compact and secure.
The layout of the Ektelo repository reflects these goals. Code that is intended
to run on a private server is found in the module ektelo/private
, while
non-private, client code is located in the module ektelo/client
. We assume
that the kernel will be setup on a private server by an entity with access to
the unaltered, private data. Along with the kernel, a kernel service
responsible for servicing client requests must also be setup on the server. On
the client side, a privacy engineer creates a protected data source, which
mediates all interactions with the kernel via communication with the kernel
service.
Ektelo is designed to support interactive data queries from the privacy
engineer to the kernel. To do so, a separate kernel instance is instantiated
with a specific privacy budget for every user. At the kernel, the total
privacy expenditure is tracked for each query according to Algorithm 6 in the
Ektelo paper. User queries are serviced until the budget has been exceeded.
At that point, a BudgetExceeded
error is sent back to the user.
- File
examples/cdf_estimator.py
provides an example of the entire Ektelo workflow. This example aligns with Algorithm 1 from the Ektelo paper. - File
examples/standalone_plan.py
provides an example of a previously published algorithm expressed as an Ektelo plan consisting of a sequence of Ektelo operators. The algorithm in this case is MWEM (Hardt et al. "A Simple and Practical Algorithm for Differentially Private Data Release." NIPS 2012). Note this example excludes the layer that manages the interaction between client code and the protected kernel. While removing this layer makes it easier to trace the plan, it also removes the privacy protection (i.e., the variableR
corresponds to the input dataset so addingprint(R)
would result in full disclosure of the "private" input). We imagine that writing Ektelo plans in this "stripped down" form may be useful for privacy researchers who are designing new algorithms and only executing on non-sensitive inputs. - File
examples/private_plan.py
is the same as the previous example (standalone_plan.py
) except that it includes the layer that manages client-kernel interaction. In this example, any interactions with the private data are mediated by the kernel, which will ensure protection. In particular, theR
variable is now aProtectedDataSource
and invoking a method onR
will trigger an interaction with the kernel. This example illustrates how a complex differentially private algorithm can be executed via client calls to the protected kernel. - File
examples/budget_exceeded.py
provides an example of a client-kernel interaction that produces such aBudgetExceeded
error.
Examples 2 and 3 above illustrate the MWEM algorithm written as an Ektelo plan. Other algorithms from the literature have also been written as plans in two places: plans/standalone.py
and plans/private.py
. The standalone plans exclude the client-kernel layer (similar to example 2 above) and the private plans include it (similar to example 3 above).
export EKTELO_HOME=$HOME/Documents/ektelo
export EKTELO_DATA=/tmp/ektelo
export PYTHON_HOME=$HOME/virtualenvs/PyEktelo
export PYTHONPATH=$PYTHONPATH:$EKTELO_HOME
export EKTELO_LOG_PATH=$HOME/logs
export EKTELO_LOG_LEVEL=DEBUG
Various system-level packages are necessary to meet the requirements for third-party python modules installed during initialization. The dependencies vary by platform. It is strongly recommended to use python version 3.6 or higher.
sudo apt-get update
sudo apt-get install --reinstall build-essential
sudo apt-get install gfortran liblapack-dev libblas-dev python3-venv
sudo apt-get install libpq-dev python3-dev libncurses5-dev swig glpk-utils
brew install swig
Be sure to setup the environment (describe above) first. You will need to install several packages. The following commands should work for debian systems.
Next, create a virtual environment for python by entering the commands below.
mkdir $EKTELO_LOG_PATH
python3 -m venv $PYTHON_HOME
source $PYTHON_HOME/bin/activate
cd $EKTELO_HOME
pip install -r resources/requirements.txt
Note: We recommend installing python modules with the same versions specified in
resources/requirements.txt
. However, if you are running python version greater
than 3.6, then it is possible that you will need to increase the module versions
as well. This can be accomplished by replacing ==
with >=
in the requirements
file.
The data must be downloaded into the $EKTELO_DATA
folder.
mkdir -p $EKTELO_DATA
curl https://www.dpcomp.org/data/cps.csv > $EKTELO_DATA/cps.csv
curl https://www.dpcomp.org/data/stroke.csv > $EKTELO_DATA/stroke.csv
Finally, after instantiating the virtualenv, compile the C libraries as follows.
cd $EKTELO_HOME/ektelo/algorithm
./setup.sh
Once initialization has been run, the virtual environment can be restored with the following command.
source $PYTHON_HOME/bin/activate
Execute the following in the base of the repository.
cd $EKTELO_HOME
nosetests
To test a specific module (in this case, TestExperiment
):
nosetests test.unit.test_data:TestData