Skip to content

Commit

Permalink
Address feedback on the intro
Browse files Browse the repository at this point in the history
* Introduce DAFs first, in order to mirror the document structure.

* Explain (function) secret sharing and give examples of aggregation
  functions that can be computed with (V)DAFs

* Give less detail about Prio3 and Poplar1

* Point to discussion about non-collusion in {{overview}}

Also, in security considerations, discuss dealing with non-static
corruptions and how to pick the number of aggregators. This addresses an
idea for the intro, but resolving it seems to require a proper security
consideration.
  • Loading branch information
cjpatton committed Aug 7, 2024
1 parent 45b75e5 commit 9ccd754
Showing 1 changed file with 145 additions and 91 deletions.
236 changes: 145 additions & 91 deletions draft-irtf-cfrg-vdaf.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,16 @@ informative:
seriesinfo: CRYPTO 2023
target: https://ia.cr/2023/1012

BGI15:
title: "Function Secret Sharing"
author:
- ins: E. Boyle
- ins: N. Gilboa
- ins: Y. Ishai
date: 2015
seriesinfo: EUROCRYPT 2015
target: https://www.iacr.org/archive/eurocrypt2015/90560300/90560300.pdf

CGB17:
title: "Prio: Private, Robust, and Scalable Computation of Aggregate Statistics"
author:
Expand All @@ -105,6 +115,8 @@ informative:
- ins: C. Patton
- ins: M. Rosulek
- ins: P. Schoppmann
date: 2023
seriesinfo: PETS 2023
target: https://ia.cr/2023/130

Dwo06:
Expand Down Expand Up @@ -172,6 +184,18 @@ informative:
seriesinfo: S&P 2020
target: https://eprint.iacr.org/2019/074

MPDST25:
title: "Mastic: Private Weighted Heavy-Hitters and Attribute-Based Metrics"
author:
- ins: D. Mouris
- ins: C. Patton
- ins: H. Davis
- ins: P. Sarkar
- ins: N.G. Tsoutsos
date: 2025
seriesinfo: PETS 2025
target: https://eprint.iacr.org/2024/221

MPRV09:
title: "Computational Differential Privacy"
author:
Expand Down Expand Up @@ -223,8 +247,9 @@ measurement that would result in an invalid aggregate result.

# Introduction

(TO BE REMOVED BY RFC EDITOR: The source for this draft and and the reference
implementation can be found at https://github.com/cfrg/draft-irtf-cfrg-vdaf.)
(RFC EDITOR: Remove this paragraph.) The source for this draft and and the
reference implementation can be found at
https://github.com/cfrg/draft-irtf-cfrg-vdaf.

The ubiquity of the Internet makes it an ideal platform for measurement of
large-scale phenomena, whether public health trends or the behavior of computer
Expand All @@ -234,12 +259,13 @@ that is valuable to measure and information that users consider private.
For example, consider an application that provides health information to users.
The operator of an application might want to know which parts of their
application are used most often, as a way to guide future development of the
application. Specific users' patterns of usage, though, could reveal sensitive
things about them, such as which users are researching a given health condition.
application. Specific users' patterns of usage, though, could reveal sensitive
things about them, such as which users are researching a given health
condition.

In many situations, the measurement collector is only interested in aggregate
statistics, e.g., which portions of an application are most used or what
fraction of people have experienced a given disease. Thus systems that provide
fraction of people have experienced a given disease. Thus systems that provide
aggregate statistics while protecting individual measurements can deliver the
value of the measurements while protecting users' privacy.

Expand All @@ -253,40 +279,88 @@ aggregation server. The aggregation server then adds up the noisy measurements,
and because it knows the distribution from which the noise was sampled, it can
estimate the true sum with reasonable accuracy.

However, even when noise is added to the measurements, collecting them in the
clear still reveals a significant amount of information to the collector. On
the one hand, depending on the "amount" of noise a client adds to its
measurement, it may be possible for a curious collector to make a reasonable
guess of the measurement's true value. On the other hand, the more noise the
clients add, the less reliable will be the server's estimate of the output.
Thus systems relying solely on a DP mechanism must strike a delicate balance
between privacy and utility.
Even when noise is added to the measurements, collecting them in the clear
still reveals a significant amount of information to the collector. On the one
hand, depending on the "amount" of noise a client adds to its measurement, it
may be possible for a curious collector to make a reasonable guess of the
measurement's true value. On the other hand, the more noise the clients add,
the less reliable will be the server's estimate of the aggregate. Thus systems
relying solely on a DP mechanism must strike a delicate balance between privacy
and utility.

The ideal goal for a privacy-preserving measurement system is that of secure
multi-party computation (MPC): No participant in the protocol should learn
multi-party computation (MPC): no participant in the protocol should learn
anything about an individual measurement beyond what it can deduce from the
differentially private aggregate {{MPRV09}}. In this document, we describe
Verifiable Distributed Aggregation Functions (VDAFs) as a general class of
delegated MPC protocols that can be used to achieve this goal.

VDAF schemes achieve their privacy goal by distributing the computation of the
aggregate among a number of non-colluding aggregation servers. As long as a
subset of the servers executes the protocol honestly, VDAFs guarantee that no
measurement is ever accessible to any party besides the client that submitted
it. VDAFs can also be composed with various DP mechanisms, thereby ensuring the
aggregate result does not leak too much information about any one measurmment.
At the same time, VDAFs are "verifiable" in the sense that malformed
measurements that would otherwise garble the result of the computation can be
detected and removed from the set of measurements. We refer to this property as
"robustness".

The VDAF abstraction laid out in {{vdaf}} represents a class of multi-party
protocols for privacy-preserving measurement proposed in the literature. These
protocols vary in their operational and security requirements, sometimes in
subtle but consequential ways. This document therefore has two important goals:

1. Providing higher-level protocols like {{?DAP=I-D.draft-ietf-ppm-dap}} with
a simple, uniform interface for accessing privacy-preserving measurement
aggregate. MPC achieves this goal by distributing the computation of the
aggregate across multiple aggregation servers, one of which is presumed to be
honest, i.e., not under control of the attacker. Moreover, MPC can be composed
with various DP mechanisms to ensure the aggregate itself does leak too much
information about any one of the measurements {{MPRV09}}.

This document describes two classes of MPC protocols, each aiming for a
different set of goals.

In a Distributed Aggregation Function (DAF, {{daf}}), each client splits its
measurement into multiple secret shares, one for each aggregation
server. DAFs require two properties of the secret sharing scheme. First, we can
reconstruct the underlying measurement by simply adding up all of the shares.
(Typically the shares are vectors over some finite field.) Second, given all
but one of the shares, it is impossible to learn anything about the underlying
measurement. These properties give rise to a simple strategy for privately
aggregating the measurements: each aggregation server adds up its measurement
shares locally before revealing their sum to the data collector; then all
the data collector has to do is add up these sums to get the aggregate.

This strategy is compatible with any aggregation function that can be
represented as the sum of some encoding of the measurements. Examples include:
summary statistics such as sum, mean, and standard deviation; estimation of
quantiles, e.g., median; histograms; linear regression; or counting data
structures, e.g., Bloom filters. However, not all functions fit into this
rubric, as it is constrained to linear computations over the encoded
measurements.

In fact, our framework admits DAFs with slightly more
functionality, computing aggregation functions of the form

~~~
f(agg_param, meas_1, ..., meas_N) =
g(agg_param, meas_1) + ... + g(agg_param, meas_N)
~~~

where `meas_1, ..., meas_N` are the measurements, `g` is a possibly non-linear
function, and `agg_param` is a parameter of that function chosen by the data
collector. This paradigm, known as function secret sharing {{BGI15}}, allows
for more sophisticated data analysis tasks, such as grouping metrics by private
client attributes {{MPDST25}} or computing heavy hitters {{BBCGGI21}}. (More
on the latter task below.)

The second class of protocols defined in this document are called Verifiable
Distributed Aggregation Functions (VDAFs, {{vdaf}}). In addition to being
private, VDAFs are verifiable in the following sense. By design, a secret
sharing of a valid measurement, e.g., a number between 1 and 10, is
indistinguishable from a secret sharing of an invalid measurement, e.g., a
number larger than 10. This means that DAFs are vulnerable to attacks from
malicious clients attempting to disrupt the computation by submitting invalid
measurements. Thus VDAFs are designed to allow the servers to detect and remove
these measurements prior to aggregation. We refer to this property as
robustness.

Achieving robustness without sacrificing privacy requires the servers to
interact with one another over a number of rounds of communication. DAFs on the
other hand are non-interactive, and are therefore easier to deploy; but they do
not provide robustness on their own. This may be tolerable in some
applications. For instance, if the client's software is executed in a trusted
execution environment, it may be reasonable to assume that no client is
malicious.

The DAF and VDAF abstractions encompass a variety of MPC techniques in the
literature. These protocols vary in their operational and security
requirements, sometimes in subtle but consequential ways. This document
therefore has two important goals:

1. Providing higher-level protocols like {{?DAP=I-D.draft-ietf-ppm-dap}} (RFC
EDITOR: remove this reference if not published before the current document)
with a simple, uniform interface for accessing privacy-preserving measurement
schemes, documenting relevant operational and security requirements, and
specifying constraints for safe usage:

Expand All @@ -305,65 +379,34 @@ This document also specifies two concrete VDAF schemes, each based on a protocol
from the literature.

* The Prio system {{CGB17}} allows for the privacy-preserving computation of a
variety aggregate statistics. The basic idea underlying Prio is fairly
simple:

1. Each client shards its measurement into a sequence of additive shares and
distributes the shares among the aggregation servers.
1. Next, each server adds up its shares locally, resulting in an additive
share of the aggregate.
1. Finally, the aggregation servers send their aggregate shares to the data
collector, who combines them to obtain the aggregate result.

The difficult part of this system is ensuring that the servers hold shares of
a valid, aggregatable value, e.g., the measurement is an integer in a
specific range. Thus Prio specifies a multi-party protocol for accomplishing
this task.

In {{prio3}} we describe Prio3, a VDAF that follows the same overall framework
as the original Prio protocol, but incorporates techniques introduced in
variety of aggregate statistics, combining additive secret sharing as described
above with a mechanism for checking the validity of each measurement. In
{{prio3}} we specify Prio3, a VDAF that follows the same overall framework as
the original Prio protocol, but incorporates techniques introduced in
{{BBCGGI19}} that result in significant performance gains.

* More recently, Boneh et al. {{BBCGGI21}} described a protocol called Poplar
for solving the `t`-heavy-hitters problem in a privacy-preserving manner. Here
each client holds a bit-string of length `n`, and the goal of the aggregation
servers is to compute the set of strings that occur at least `t` times. The
core primitive used in their protocol is a specialized Distributed Point
Function (DPF) {{GI14}} that allows the servers to "query" their DPF shares on
any bit-string of length shorter than or equal to `n`. As a result of this
query, each of the servers has an additive share of a bit indicating whether
the string is a prefix of the client's string. The protocol also specifies a
multi-party computation for verifying that at most one string among a set of
candidates is a prefix of the client's string.

In {{poplar1}} we describe a VDAF called Poplar1 that implements this
functionality.

Finally, perhaps the most complex aspect of schemes like Prio3 and Poplar1 is
the process by which the client-generated measurements are prepared for
aggregation. Because these constructions are based on secret sharing, the
servers will be required to exchange some amount of information in order to
verify the measurement is valid and can be aggregated. Depending on the
construction, this process may require multiple round trips over the network.

There are applications in which this verification step may not be necessary,
e.g., when the client's software is run a trusted execution environment. To
support these applications, this document also defines Distributed Aggregation
Functions (DAFs) as a simpler class of protocols that aim to provide the same
privacy guarantee as VDAFs but fall short of being verifiable.

> OPEN ISSUE Decide if we should give one or two example DAFs. There are natural
> variants of Prio3 and Poplar1 that might be worth describing.

The remainder of this document is organized as follows: {{overview}} gives a
brief overview of DAFs and VDAFs; {{daf}} defines the syntax for DAFs; {{vdaf}}
* The Poplar protocol {{BBCGGI21}} solves the heavy-hitters problem in a
privacy-preserving manner. Here each client holds a bit-string, and the goal
of the aggregation servers is to compute the set of strings that occur at
least `t` times for some threshold `t`. The core primitive in their protocol
is a secret sharing of a point function {{GI14}} (`g` in the notation above)
that allows the servers to privately count how many of the clients' strings
begin with a given prefix (`agg_param` in the notation above). In {{poplar1}}
we specify a VDAF called Poplar1 that implements this functionality.

The remainder of this document is organized as follows: {{conventions}} lists
definitions and conventions used for specification; {{overview}} gives a brief
overview of DAFs and VDAFs, the parties involved in the computation, and the
requirements for non-collusion; {{daf}} defines the syntax for DAFs; {{vdaf}}
defines the syntax for VDAFs; {{prelim}} defines various functionalities that
are common to our constructions; {{prio3}} describes the Prio3 construction;
{{poplar1}} describes the Poplar1 construction; and {{security}} enumerates the
security considerations for VDAFs.
security considerations for DAFs and VDAFs.

## Change Log

(RFC EDITOR: Remove this section.)

(\*) Indicates a change that breaks wire compatibility with the previous draft.

10:
Expand Down Expand Up @@ -604,7 +647,7 @@ security considerations for VDAFs.
* Remove public parameter and replace verification parameter with a
"verification key" and "Aggregator ID".

# Conventions and Definitions
# Conventions and Definitions {#conventions}

{::boilerplate bcp14-tagged}

Expand Down Expand Up @@ -731,7 +774,8 @@ correctness of the measurements obtained. The privacy properties of the system
are assured by non-collusion among Aggregators, and Aggregators are the entities
that perform validation of Client measurements. Thus Clients trust Aggregators
not to collude (typically it is required that at least one Aggregator is
honest), and Collectors trust Aggregators to correctly run the protocol.
honest; see {{num-aggregators}}), and Collectors trust Aggregators to correctly
run the protocol.

Within the bounds of the non-collusion requirements of a given (V)DAF instance,
it is possible for the same entity to play more than one role. For example, the
Expand Down Expand Up @@ -1757,7 +1801,7 @@ number of rounds of preparation that are required, there may be one more
message to send before the peer can also finish processing (i.e., `outbound !=
None`).

## Star Topology (Any Number of Aggregators)
## Star Topology (Any Number of Aggregators) {#star-topo}

The ping-pong topology of the previous section is only suitable for VDAFs
involving exactly two Aggregators. If the VDAF supports more than two
Expand Down Expand Up @@ -5157,7 +5201,7 @@ any positive value of `BITS`. Test vectors can be found in {{test-vectors}}.

# Security Considerations {#security}

VDAFs have two essential security goals:
VDAFs ({{vdaf}}) have two essential security goals:

1. Privacy: An attacker that controls the Collector and a subset of Clients and
Aggregators learns nothing about the measurements of honest Clients beyond
Expand Down Expand Up @@ -5421,6 +5465,16 @@ instead, but `PROOFS` MUST be set to at least `3`. Breaking robustness for
`PROOFS == 2` is feasible, if impractical; but `PROOFS == 1` is completely
broken for such a small field.

## Choosing the Number of Aggregators {#num-aggregators}

Two Aggregators are required for privacy in our threat model, but some (V)DAFs,
including Prio3 ({{prio3}}), allow for any number of Aggregators, only one of
which needs to be trusted in order for the computation to be private. To hedge
against corruptions that happen during the course of the attack, deployments
may consider involving more than two Aggregators as described for example in
{{star-topo}}. Note however that some schemes are not compatible with this mode of operation,
such as Poplar1.

# IANA Considerations

A codepoint for each (V)DAF in this document is defined in the table below. Note
Expand Down

0 comments on commit 9ccd754

Please sign in to comment.