Skip to content

Commit

Permalink
fix: make prv accountant robust to larger epsilons (pytorch#606)
Browse files Browse the repository at this point in the history
Summary:
## Types of changes

- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Docs change / refactoring / dependency upgrade

## Motivation and Context / Related issue

Hi,

this PR fixes pytorch#601 and pytorch#604.

It will introduce the same fix as in microsoft/prv_accountant#38. Lukas (author of prv accountant, wulu473) said that  `In general, adding any additional points is safe and won't affect the robustness negatively.`

The cause of these errors seems to be the grid for computing the  `mean()` function of the `PrivacyRandomVariableTruncated` class. The grid (`points` variable) used to compute the mean is constant apart from the lowest (`self.t_min`) and highest point (`self.t_max`).

This PR determines the grid (`points` variable)  based on the lowest and highest point. More information is below.

Best

**Observation**

I debugged the code and arrived at some point at the `mean()` function of the `PrivacyRandomVariableTruncated` class. The grid (`points` variable) used to compute the mean is constant apart from the lowest (`self.t_min`) and highest point (`self.t_max`). See the line of code [here](https://github.com/microsoft/prv_accountant/blob/a95c4e2d41ff4886c3e4a84925edf878a6540e0a/prv_accountant/privacy_random_variables/abstract_privacy_random_variable.py#L52). It looks like this `[self.tmin, -0.1, -0.01, -0.001, -0.0001, -1e-05, 1e-05, 0.0001, 0.001, 0.01, 0.1, self.tmax]`.

It seems that the `tmin` and `tmax` are of the order of `[-12,12]` for the examples that I posted above and even up to `[-48,48]` for the example that jeandut posted in the pytorch#604 issue whereas they are more like `[-7,7]` for the [readme example for DP-SGD](https://github.com/microsoft/prv_accountant#dp-sgd).

We suspect that the integration breaks down when the gridspacing between between `tmin` / `tmax` get's too large.

**Proposed solution**

Determine the points grid based on `tmin` and `tmax`  but determines the start and end of the logspace based on `tmin` and `tmax`.

Before: (https://github.com/pytorch/opacus/blob/95df0904ae5d2b3aaa26b708e5067e9271624036/opacus/accountants/analysis/prv/prvs.py#L99-L106)

After:
```
# determine points based on t_min and t_max
lower_exponent = int(np.log10(np.abs(self.t_min)))
upper_exponent = int(np.log10(self.t_max))
points = np.concatenate(
    [
        [self.t_min],
        -np.logspace(start=lower_exponent, stop=-5, num=10),
        [0],
        np.logspace(start=-5, stop=upper_exponent, num=10),
        [self.t_max],
    ]
)
```

## How Has This Been Tested (if it applies)

I ran the examples from the issues pytorch#601 and pytorch#604 and they don't break anymore.

```
import opacus
target_delta = 0.001
target_epsilon = 20
steps = 5000
sample_rate=0.19120458891013384

for target_epsilon in [20, 50]:
    noise_multiplier = opacus.privacy_engine.get_noise_multiplier(target_delta=target_delta, target_epsilon=target_epsilon, steps=steps, sample_rate=sample_rate, accountant="prv")
    prv_accountant = opacus.accountants.utils.create_accountant("prv")
    prv_accountant.history = [(noise_multiplier, sample_rate, steps)]
    obtained_epsilon = prv_accountant.get_epsilon(delta=target_delta)
    print(f"target epsilon {target_epsilon}, obtained epsilon {obtained_epsilon}")

```
> target epsilon 20, obtained epsilon 19.999332284974717
target epsilon 50, obtained epsilon 49.99460075990896

```
target_epsilon = 4
batch_size = 50
epochs = 5
delta = 1e-05
expected_len_dataloader = 500 // batch_size
sample_rate = 1/expected_len_dataloader

noise_multiplier = opacus.privacy_engine.get_noise_multiplier(target_delta=target_delta, target_epsilon=target_epsilon, epochs=epochs, sample_rate=sample_rate, accountant="prv")
prv_accountant = opacus.accountants.utils.create_accountant("prv")
prv_accountant.history = [(noise_multiplier, sample_rate, int(epochs / sample_rate))]
obtained_epsilon = prv_accountant.get_epsilon(delta=target_delta)
print(f"target epsilon {target_epsilon}, obtained epsilon {obtained_epsilon}")

```
> target epsilon 4, obtained epsilon 3.9968389923130356

## Checklist

- [x] The documentation is up-to-date with the changes I made.
- [x] I have read the **CONTRIBUTING** document and completed the CLA (see **CONTRIBUTING**).
- [ ] All tests passed, and additional code has been covered with new tests.

Not able to run all tests locally and unsure if new tests should be added.

Pull Request resolved: pytorch#606

Reviewed By: HuanyuZhang

Differential Revision: D50111887

fbshipit-source-id: 2f77f8bc0e59837f765b87f2e107bc01015b9481
  • Loading branch information
Solosneros authored and facebook-github-bot committed Nov 28, 2023
1 parent 3d622d0 commit ad084da
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions opacus/accountants/analysis/prv/prvs.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,11 +96,15 @@ def mean(self) -> float:
"""
Calculate the mean using numerical integration.
"""
# determine points based on t_min and t_max
lower_exponent = int(np.log10(np.abs(self.t_min)))
upper_exponent = int(np.log10(self.t_max))
points = np.concatenate(
[
[self.t_min],
-np.logspace(-5, -1, 5)[::-1],
np.logspace(-5, -1, 5),
-np.logspace(start=lower_exponent, stop=-5, num=10),
[0],
np.logspace(start=-5, stop=upper_exponent, num=10),
[self.t_max],
]
)
Expand Down

0 comments on commit ad084da

Please sign in to comment.