Skip to content

Commit

Permalink
docs: explain implementation (#55)
Browse files Browse the repository at this point in the history
* docs: add implementation details and images

* fix: use relative file path

* fix: try absolute file path

* docs: try image path top level

* docs: use top level indentation only

* docs: add PSI description

* docs: add optimization description

* docs: remove image

* docs: improve optimization description

* docs: add About section to frontend README

* docs: add Testing section to README

* docs: remove image folder

* docs: use LaTeX instead of markdown for math text

* docs: fix typo

* docs: change some wording

* docs: use imgur link for image url

GitHub recently changed the image hosting system in issues. Links are no longer persistent, so they can't be reliably used in the README.
  • Loading branch information
csirianni authored Dec 25, 2023
1 parent fb8adae commit 0f20e4d
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 15 deletions.
22 changes: 20 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,27 @@ Private Data Lookup (PDL) is a web application that allows users to privately qu
> - [How Meta is improving password security and preserving privacy](https://engineering.fb.com/2023/08/08/security/how-meta-is-improving-password-security-and-preserving-privacy/)
> - [Data Breaches, Phishing, or Malware?: Understanding the Risks of Stolen Credentials](https://dl.acm.org/doi/10.1145/3133956.3134067)
## Implementation

### PSI

In Private Set Intersection, neither party reveals anything to their counterpart except for the elements in the intersection. This is accomplished using encryption. Hashed passwords are encrypted using secret key $a$ on the frontend and secret key $b$ on the backend. Querying the set of breached passwords is a three step process:

1. The client sends an encrypted user password $\text{Hash}(p)^a$ to the server.
2. The server sends the re-encrypted user password $\text{Hash}(p)^{ab}$ and the encrypted breached passwords $\text{Hash}(b_1)^{b}, ...,\text{Hash}(b_n)^{b}$ to the client.
3. The client partially decrypts the user password using $a^{-1}$ and checks if $\text{Hash}(p)^{aba^{-1}}$ is contained in the set of breached passwords.

If the set intersection is non-empty, the user's password is compromised and should not be used.

### Performance Optimization

The initial PSI implementation unreasonably increases critical path latency due to the size of the breached password dataset. To address this challenge, [k-anonymity](https://en.wikipedia.org/wiki/K-anonymity) is used. Passwords are partitioned in $k$ buckets based on one or more leaked bytes. Given $n$ leaked bytes, there are $\left[0, (2^8)^n - 1\right]$ buckets. The client generates a partition index using $n$ leaked bytes, and then the server returns a smaller subset of the dataset. The result is a decrease in the number of serialized passwords per request and faster processing times.

This feature involves a tradeoff between user privacy and application performance. The key assumption is that the number of breached passwords is sufficiently large to not reveal identifiable information about individual users. Since real breached password datasets contain billions of passwords [[1](https://www.wired.com/story/collection-leak-usernames-passwords-billions/)], each bucket contains millions of passwords. Thus, the assumption holds and the increased leakage involves neglible privacy risk.

## Instructions

It's necessary to configure the `/frontend` and `/backend` folders initially. See the respective `README.md`s for more information. After configuration, you can run the application using the following commands.
You need to configure the `/frontend` and `/backend` folders initially. See the respective `README.md`s for more information. After configuration, you can run the application using the following commands.

To run the frontend, `cd` into `/frontend` and run

Expand All @@ -41,4 +59,4 @@ If you want to build a new database from a new or existing path, you can use the
build/src/server <database filepath> --build
```

Ensure that the backend is running with the frontend, otherwise you will see a server error on the front-end website.
Ensure that the backend is running with the frontend, otherwise you will see a server error in the web application.
23 changes: 14 additions & 9 deletions backend/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## About

The backend hosts the breached passwords for use in the Private Set Intersection computation. The user sends a password and receives that password and the set of breached passwords both encrypted with secret key `b`. The backend uses [Crow](https://crowcpp.org/master/) for the REST API, [SQLite3](https://www.sqlite.org/index.html) for the breached password database, and [Libsodium](https://libsodium.gitbook.io/doc/) for the cryptography. The API has a single endpoint:
The backend hosts the breached passwords for use in the Private Set Intersection computation. The user sends a password and receives that password and the set of breached passwords both encrypted with secret key $b$. The backend uses [Crow](https://crowcpp.org/master/) for the REST API, [SQLite3](https://www.sqlite.org/index.html) for the breached password database, and [Libsodium](https://libsodium.gitbook.io/doc/) for the cryptography. The API has a single endpoint:

### Encrypt user password and get breached passwords

Expand All @@ -14,7 +14,7 @@ This endpoint encrypts the user's password and provides a list of encrypted brea

`body` *Required*

The leaked bytes followed by the user's encrypted password. In general, for `n` leaked bytes, the length of the data is `32 + n`:
The leaked bytes followed by the user's encrypted password. In general, for $n$ leaked bytes, the length of the data is $32 + n$:

```text
0 n 32 + n
Expand Down Expand Up @@ -59,34 +59,39 @@ const response = await fetch(

In `/backend`, start by installing [Conan](https://conan.io/):

```bash
```console
brew install conan
```

Then, install the project's packages using the following:

```console
conan install . --output-folder=build --build=missing
```

You probably need to create a default profile. Use `conan profile detect`.

If you haven't installed CMake already, do so now:

```bash
```console
brew install cmake
```

Next, link and compile the program:

```bash
```console
make build
```

From `/backend`, start the server. Use the `--build` flag to create or rebuild a database for the breached passwords:

```bash
```console
build/src/server <database filepath> --build
```

Or, omit the `--build` flag to use an existing database:

```bash
```console
build/src/server data/passwords.db
```

Expand All @@ -104,12 +109,12 @@ To fix VS Code import errors, try adding the following line to your `settings.js

After building, you can run tests from `/backend`:

```bash
```console
cd build && ./test/pdl_test
```

or alternatively:

```bash
```console
make check
```
36 changes: 32 additions & 4 deletions frontend/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,17 @@
# Frontend

Made with [Next.js](https://nextjs.org/).
## About

The frontend renders the client web application and computes the private set intersection. It is made with [Next.js](https://nextjs.org/), [Tailwind](https://tailwindcss.com/), and [MaterialUI](https://mui.com/material-ui/).

![Sign up page](https://i.imgur.com/8sea2io.png)

Note that a password must satisfy the listed requirements before the user can click "Sign Up."

## Configuration

Make sure you have [Yarn](https://yarnpkg.com/) and [Node](https://nodejs.org/en) installed.

To run the frontend server, use your preferred terminal to `cd` into `/frontend` and then install the required packages by running

```bash
Expand All @@ -18,8 +26,28 @@ yarn dev

Open [http://localhost:3000](http://localhost:3000) to view it in the browser. The page will reload if you make edits.

## Deploy on Vercel
## Testing

You can run tests from `/frontend`:

```console
yarn test
```

This command involves integration testing, so make sure the backend is running and that the leaked bytes are the same. For example,

`/frontend/tests/psi.test.ts`

The easiest way to deploy your Next.js app is to use the [Vercel Platform](https://vercel.com/new?utm_medium=default-template&filter=next.js&utm_source=create-next-app&utm_campaign=create-next-app-readme) from the creators of Next.js.
```javascript
test("sending breached password should return fail status", async () => {
const password = "TestPass1&";
const response = await checkSecurity(password, 1);
expect(response.status).toBe("fail");
});
```

```/backend/src/main.cpp```

Check out our [Next.js deployment documentation](https://nextjs.org/docs/deployment) for more details.
```cpp
const size_t offset = 1;
```

0 comments on commit 0f20e4d

Please sign in to comment.