tf-encrypted · jvmncs · Apr 27, 2020 · Apr 29, 2020 · May 4, 2020 · May 4, 2020
diff --git a/rfcs/20200504-tff-aggregators-placement.md b/rfcs/20200504-tff-aggregators-placement.md
@@ -0,0 +1,226 @@
+# TF Federated Aggregators Placement
+
+| Status        | Proposed       |
+:-------------- |:---------------------------------------------------- |
+| **RFC #**     | https://github.com/tensorflow/community/pull/TODO    |
+| **Author(s)** | Jason Mancuso ([email protected])                |
+| **Sponsor**   | Michael Reneer ([email protected])            |
+| **Updated**   | 2020-05-04                                           |
+
+## Objective
+
+This document proposes adding a `tff.AGGREGATORS` placement to the Federated Core
+(FC) in TensorFlow Federated (TFF). This would lift the requirement that all
+aggregations be computed on `tff.SERVER` while still allowing users to express
+custom aggregation logic using FC & TF.
+
+## Motivation
+
+When approaching federated learning with an eye for security or privacy, it is
+useful to divide federated computation into two categories: computations performing
+aggregations, and computations performing on-device computation.  Security and
+privacy issues tend to show up during the aggregation phase. This is particularly
+clear when looking at common methods of adding security guarantees to traditional,
+parameter-server style federated learning, for example with secure aggregation or
+differentially private federated averaging (DP-FedAvg).
+
+In security-heightened settings, it is often worthwhile to separate computation
+done in this aggregation phase from computation performed by the server in the
+traditional parameter server setup. This amounts to delegating aggregations to a
+third-party service. For example, when the clients are mistrustful of the server,
+aggregations might be delegated to a trusted execution environment or to  a cluster
+of machines engaging in a secure multi-party computation protocol. Another example
+is secure aggregation in the
+[Encode-Shuffle-Analyze (ESA)](https://arxiv.org/abs/1710.00901)
+model, which in a federated context generally assumes an additional
+party to perform the secure shuffling needed to realize a differential privacy
+guarantee. Since this is an established area of the literature with strong
+motivations and results, we see this as an important line of work for TFF to
+support in order to keep with its
+[project goals](https://github.com/tensorflow/federated#tensorflow-federated).
+
+In general, any secure aggregation protocol can be represented as a coordinated
+computation between three groups of parties: a server, a (potentially singleton)
+set of aggregators, and a set of clients. Note that these need not be mutually
+exclusive, so for example the traditional parameter server setting can be recovered
+as a special case by treating the server as a singleton aggregators set.
+
+The TFF Federated Core (FC) language currently realizes logically-distinct parties
+as "placements". While there exist `tff.SERVER` and `tff.CLIENTS` placements in FC,
+there is no `tff.AGGREGATORS` placement. Without such a placement, implementing new
+aggregation protocols in TFF can require low-level programming of the TFF executor
+stacks, as evidenced by
+[this community attempt to integrate secure aggregation](https://github.com/tf-encrypted/rfcs/blob/master/20190924-tensorflow-federated/integration-strategies.md).
+By adding a new `tff.AGGREGATORS` placement, users can more easily implement new
+aggregation protocols by expressing them as federated computations in FC.
+
+## User Benefit
+
+Users can now express custom aggregation protocols in the Federated Core by working
+with federated data placed on `tff.AGGREGATORS`. Users will be unencumbered by the
+constraints of the current federated types in FC.
+
+## Design Proposal
+
+Adding the `tff.AGGREGATORS` placement for federated types involves adding a new
+`Placement` and `PlacementLiteral`, and then extending the compiler to recognize
+federated values with this placement when computing intrinsics. The compiler
+generally defines separate intrinsics by placement; e.g. 
+`tff.federated_value(value, placement)` is actually interpreted by the compiler as
+`federated_value_at_clients(value)` or `federated_value_at_server(value)`,
+depending on the provided `placement`. This means we we will want to add new
+intrinsics that correspond to `tff.AGGREGATORS`, e.g.
+`federated_value_at_aggregators`.
+
+Existing federated computation that will need modification fall into the two
+categories below:
+
+1. Intrinsics for federated computations that are already parameterized by
+placement. Note some of these functions don't have a `placement` arg in their
+public API signature, but internally correspond to different IntrinsicDefs based on
+placement of their federated input(s).
+    - `federated_eval`
+    - `federated_map`
+    - `federated_value`
+    - `federated_zip`
+    - `sequence_map`
+2. Intrinsics that will need to be parameterized by placement, but currently
+aren't. 
+    - `federated_aggregate`
+    - `federated_broadcast`
+    - `federated_collect`
+    - `federated_mean`
+    - `federated_reduce`
+    - `federated_secure_sum`
+    - `federated_sum`
+    - `sequence_reduce`
+    - `sequence_sum`
+
+Intrinsics in the latter category will likely need further discussion. This is
+because implementation details could change aspects of the underlying "federated
+algebra", like closure, or could introduce subtle semantic changes.
+
+As an example, assume we extend `federated_collect` to handle signatures of
+`T@CLIENTS -> T*@AGGREGATORS` and `T@AGGREGATORS -> T*@SERVER` (in addition to the
+current `CLIENTS -> SERVER`). If we want to maintain algebraic closure, we would
+extend `federated_broadcast` to handle `T@SERVER -> {T}@AGGREGATORS` and
+`T@AGGREGATORS -> {T}@CLIENTS`; similarly, we would extend `sequence_reduce` to
+handle values of type `T@AGGREGATORS`. In this scenario, the new
+`federated_broadcast` would be a natural generalization of the old, however it's
+not clear if this kind of semantic change would be confusing to users of the FC.
+
+We hope this will be a good starting point for discussion. Ultimately, the RFC
+process should allow us to elaborate the exact type signatures that each of the new
+IntrinsicDefs should satisfy.
+
+### Alternatives Considered
+A lower effort alternative might be to expect users to write custom executors, or
+custom executor stacks, to include additional "aggregator" parties when executing
+intrinisics. "AGGREGATORS" would stay outside of the FC type system, but could
+still be included in federated computations. This might allow library designers to
+extend TFF for their own use cases, but hinders the majority of TFF users who are
+not expected to learn the executor API.
+
+We also briefly considered the name `tff.AGGREGATOR` instead of `tff.AGGREGATORS`.
+We decided on the latter for two reasons:
+1. `tff.AGGREGATOR` does not capture the possibility of multiple executor stacks
+coordinating aggregation (the existing `ComposingExecutor` qualifies as one such
+case).
+2. `tff.AGGREGATOR` is equivalent to a singleton `tff.AGGREGATORS`.
+
+### Performance Implications
+This is an additive improvement to the FC, so there should be no performance
+implications for existing functionality. TFF is designed to support this kind of
+addition with minimal overhead. New functionality could be less performant relative
+to current practices, but only from overhead inherent to adding a new node to a
+distributed computation.
+
+### Dependencies
+This change brings no new dependencies. Since this proposal adds a new federated
+type, any project that enforces limits based on the current federated types may
+have to be updated. We will work with the TFF team to identify any affected
+projects and limit any breaking changes.
+
+### Engineering Impact
+This code will likely bring marginal increases to build and test time, but changes
+to binary size should be negligible. Executor factories including a stack for the
+`tff.AGGREGATORS` placement will experience a nontrivial increase in startup time,
+but not all executor factories will need to include a stack for this placement.
+
+The code for this change will be mixed into existing modules in the TF Federated
+core. Since it affects the type system used by the TFF compiler and requires that
+relevant intrinsic definitions be modified to recognize a new placement, it will
+touch many different places in the TFF stack. Those who already own and maintain
+those code units will maintain and improve the change in the future, which makes
+their feedback critical throughout design and implementation.
+
+### Best Practices
+The new `Placement` for federated types brings an addition to the Federated Core,
+which will be communicated in the TFF API documentation. This will only be relevant
+for users of the lower-level Federated Core, at least until a higher level API is
+included that relies on it. Below, we detail how this change should be communicated
+by existing tutorials.
+
+### Tutorials and Examples
+Since this is a modification of an existing API, it likely does not warrant a new
+tutorial. We instead suggest modifying the existing
+[Part 1 Federated Core tutorial (FC 1)](https://www.tensorflow.org/federated/tutorials/custom_federated_algorithms_1)
+to include one or more federated computations that operate on Aggregator-placed
+data. We also considered modifying part 2 of the FC tutorial, but decided against
+that due to its stated goals.
+
+Concretely, we recommend two modifications to the FC 1 tutorial:
+- In the "Placement" section, the discussion will need to include the
+`tff.AGGREGATORS` placement. This section should stress that the placement can be
+considered optional, whereas the others (`tff.CLIENTS`, `tff.SERVER`) are strictly
+necessary for most interesting federated computations.
+- In the "Composing Federated Computations" section, we recommend adding a short
+sub-section or paragraph that describes how one might refactor the
+`get_average_temperature` function to perform its `federated_mean` with a placement
+of `tff.AGGREGATORS`. We include short and long form examples below for
+consideration.
+
+```python
+# short form
+@tff.federated_computation(tff.FederatedType(tf.float32, tff.CLIENTS))
+def get_average_temperature(sensor_readings):
+    averaged_temp = tff.federated_mean(sensor_readings, placement=tff.AGGREGATORS)
+    return tff.federated_collect(averaged_temp, placement=tff.SERVER)
+
+# long form
+@tff.federated_computation(tff.FederatedType(tf.float32, tff.CLIENTS))
+def get_average_temperature(sensor_readings):
+    collected_readings = tff.federated_collect(sensor_readings, placement=tff.AGGREGATORS)
+    num_clients = len(collected_readings)
+    total_temp = tff.sequence_sum(collected_readings)
+    return tff.federated_map(lambda x: x / num_clients, total_temp)
+```
+
+### Compatibility
+
+Since this design adds new functionality, it would change the public API. While TFF
+is still pre-1.0, it does not yet explicitly guarantee backwards compatibility of
+its public API. Nevertheless, we can hope to limit impact on the public API through
+judicious use of default keyword arguments.
+
+Concretely, we can maintain backwards compatibility for federated computations that
+gain a `placement` keyword argument by defaulting that argument to `tff.SERVER`. We
+recommend _judicious_ use because there may be instances where a change in semantic
+justifies a breaking change. These should be taken on a case-by-case basis, and we
+hope to clearly define and justify any breaking changes that might arise.
+
+This design does not significantly impact compatibility with the rest of the TF
+ecosystem.
+
+## Questions and Discussion Topics
+
+- Which of the intrinsics above should actually be modified/parameterized?
+- How strict should we be about algebraic closure in the federated type system?
+There could be an argument against, e.g. if we want to limit which intrinsics can
+ever involve `tff.AGGREGATORS`.
+- Are the existing tutorial changes sufficient? What is the best way to communicate
+these changes in existing documentation?
+- What should the implementation/release strategy be? Should this wait until TFF
+1.x.x?
+- Once changes to current intrinsics have been planned, what qualifies as a
+"judicious" use of defaults for maintaining backwards compatiblity?